Xavier Conort is a visionary data scientist with greater than 25 years of knowledge experience. He began his profession as an actuary within the insurance industry before transitioning to data science. He’s a top-ranked Kaggle competitor and was the Chief Data Scientist at DataRobot before co-founding FeatureByte.
FeatureByte is on a mission to scale enterprise AI, by radically simplifying and industrializing AI data. The feature engineering and management platform empowers data scientists to create and share state-of-the-art features and production-ready data pipelines in minutes – as an alternative of weeks or months.
You began your profession as an actuary within the Insurance industry before transitioning to Data Science, what caused this shift?
A defining moment was winning the GE Flight Quest, a contest organized by GE with a $250K pool prize, where participants needed to predict delays of US domestic flights. I owe a part of that success to a invaluable insurance practice: the two stages modeling. This approach helps control bias in features that lack sufficient representation within the available training data. Together with other wins on Kaggle, this achievement convinced me that my actuarial background afforded me a competitive advantage in the sector of knowledge science.
During my Kaggle journey, I also had the privilege of connecting with other enthusiastic data scientists, including Jeremy Achin and Tom De Godoy, who would later develop into the founders of DataRobot. We shared a standard background in insurance and had achieved notable successes on Kaggle. Once they eventually launched DataRobot, an organization specializing in AutoML, they invited me to hitch them because the Chief Data Scientist. Their vision of mixing the perfect practices from the insurance industry with the facility of machine learning excited me, presenting a possibility to create something modern and impactful.
At DataRobot and were instrumental in constructing their Data Science roadmap. What variety of data challenges did you face?
Probably the most significant challenge we faced was the various quality of knowledge provided as input to our AutoML solution. This issue often resulted in either time-consuming collaboration between our team and clients or disappointing ends in production if not addressed appropriately. The standard issues stemmed from multiple sources that required our attention.
Considered one of the first challenges arose from the overall use of business intelligence tools for data prep and management. While these tools are invaluable for generating insights, they lack the capabilities required to make sure point-in-time correctness for machine learning data preparation. In consequence, leaks in training data could occur, resulting in overfitting and inaccurate model performance.
Miscommunication between data scientists and data engineers was one other challenge that affected the accuracy of models during production. Inconsistencies between the training and production phases, arising from misalignment between these two teams, could impact model performance in a real-world environment.
What were a few of the key takeaways from this experience?
My experience at DataRobot highlighted the importance of knowledge preparation in machine learning. By addressing the challenges of generating model training data, reminiscent of point-in-time correctness, expertise gaps, domain knowledge, tool limitations, and scalability, we will enhance the accuracy and reliability of machine learning models. I got here to the conclusion that streamlining the info preparation process and incorporating modern technologies can be instrumental in unlocking the complete potential of AI and delivering on its guarantees.
We also heard out of your Co-Founder Razi Raziuddin in regards to the genesis story behind FeatureByte, could we get your version of the events?
Once I discussed my observations and insights with my Co-Founder Razi Raziuddin, we realized that we shared a standard understanding of the challenges in data preparation for machine learning. During our discussions, I shared with Razi my insights into the recent advancements within the MLOps community. I could observe the emergence of feature stores and have platforms that AI-first tech corporations put in place to scale back the latency of feature serving, encourage feature reuse or simplify feature materialization into training data while ensuring training-serving consistency. Nonetheless, it was evident to us that there was still a spot in meeting the needs of knowledge scientists. Razi shared with me his insights into how the trendy data stack has revolutionized BI and analytics, but will not be being fully leveraged for AI.
It became apparent to each Razi and me that we had the chance to make a big impact by radically simplifying the feature engineering process and providing data scientists and ML engineers with the precise tools and user experience for seamless feature experimentation and have serving.
What were a few of your biggest challenges in making the transition from data scientist to entrepreneur?
Transitioning from a knowledge scientist to an entrepreneur required me to vary from a technical perspective to a broader business-oriented mindset. While I had a robust foundation in understanding pain points, making a roadmap, executing plans, constructing a team, and managing budgets, I discovered that crafting the precise messaging that actually resonated with our audience was one among my biggest obstacles.
As a knowledge scientist, my primary focus had at all times been on analyzing and interpreting data to derive invaluable insights. Nonetheless, as an entrepreneur, I needed to redirect my considering towards the market, customers, and the general business.
Fortunately, I used to be in a position to overcome this challenge by leveraging the experience of somebody like my Co-Founder Razi.
We heard from Razi about why feature engineering is so difficult, in your view what makes it so difficult?
Feature engineering has two major challenges:
- Transforming existing columns: This involves converting data into an acceptable format for machine learning algorithms. Techniques like one-hot encoding, feature scaling, and advanced methods reminiscent of text and image transformations are used. Creating latest features from existing ones, like interaction features, can greatly enhance model performance. Popular libraries like scikit-learn and Hugging Face provide extensive support for such a feature engineering. AutoML solutions aim to simplify the method too.
- Extracting latest columns from historical data: Historical data is crucial in problem domains reminiscent of suggestion systems, marketing, fraud detection, insurance pricing, credit scoring, demand forecasting, and sensor data processing. Extracting informative columns from this data is difficult. Examples include time for the reason that last event, aggregations over recent events, and embeddings from sequences of events. This kind of feature engineering requires domain expertise, experimentation, strong coding and data engineering skills, and deep data science knowledge. Aspects like time leakage, handling large datasets, and efficient code execution also need consideration.
Overall, feature engineering requires expertise, experimentation and construction of complex ad-hoc data pipelines within the absence of tools specifically designed for it.
Could you share how FeatureByte empowers data science professionals while simplifying feature pipelines?
FeatureByte empowers data science professionals by simplifying the entire process in feature engineering. With an intuitive Python SDK, it enables quick feature creation and extraction from XLarge Event and Item Tables. Computation is efficiently handled by leveraging the scalability of knowledge platforms reminiscent of Snowflake, DataBricks and Spark. Notebooks facilitate experimentation, while feature sharing and reuse save time. Auditing ensures feature accuracy, while immediate deployment eliminates pipeline management headaches.
Along with these capabilities offered by our open-source library, our enterprise solution provides a comprehensive framework for managing and organizing AI operations at scale, including governance workflows and a user interface for the feature catalog.
What’s your vision for the long run of FeatureByte?
Our ultimate vision for FeatureByte is to revolutionize the sector of knowledge science and machine learning by empowering users to unleash their full creative potential and extract unprecedented value from their data assets.
We’re particularly excited in regards to the rapid progress in Generative AI and transformers, which opens up a world of possibilities for our users. Moreover, we’re dedicated to democratizing feature engineering. Generative AI has the potential to lower the barrier of entry for creative feature engineering, making it more accessible to a wider audience.
In summary, our vision for the long run of FeatureByte revolves around continuous innovation, harnessing the facility of Generative AI, and democratizing feature engineering. We aim to be the go-to platform that allows data professionals to remodel raw data into actionable input for machine learning, driving breakthroughs and advancements across industries.
Do you will have any advice for aspiring AI entrepreneurs?
Define your space, stay focused and welcome novelty.
By defining the space that you wish to own, you possibly can differentiate yourself and establish a robust presence in that area. Research the market, understand the needs and pain points of potential customers, and strive to supply a singular solution that addresses those challenges effectively.
Define your long-term vision and set clear short-term goals that align with that vision. Think about constructing a robust foundation and delivering value in your chosen space.
Finally, while it is important to remain focused, don’t draw back from embracing novelty and exploring latest ideas inside your defined space. The AI field is continuously evolving, and modern approaches can open up latest opportunities.