I’m excited to share that I’ve led Infinity AI’s seed round on behalf of Matrix. The full announcement is up on Infinity’s blog; here, I wanted to share my take on what’s special about Infinity and why I decided to invest.
Machine learning gobbles data. But where does that data come from? Traditionally, the path to modeling the real world with machine learning has been to capture data streams from the real world and then meticulously label them before training a model on that data. But real-world data is limited: tricky to collect, finicky to label, and often too sparse in the end to train a model effectively. Machine learning engineers end up stuck in front of their screens: to improve model performance, they’d need to go back to square one and capture, label, and ingest edge case data from scratch. Big companies have processes for this, but they don’t happen instantly. At smaller companies, machine learning engineers are often on their own; it’s not uncommon for an engineer to personally label a dataset scrounged from the public web.
What machine learning engineers need is unlimited data: abundant, instant, perfectly labeled, and tuned to their domain. That‘s the promise of synthetic data—realistic data developed in simulation and programmatically generated with pixel-perfect labels to train the gap at hand. And that’s what Infinity AI delivers: synthetic data as a service, built by machine learning engineers for machine learning engineers and designed to help develop better ML models faster.
Infinity AI was born out of a personal need: the founders were stuck in front of their screens, trying to improve the performance of a pose estimation model, and decided to see what lift they could get from adding targeted synthetic data into their training set. The results surprised even them, and led to the insight that synthetic data was no longer a fringe topic but was relevant and practical today. The challenge is that to develop data in simulation, you first need to build a simulation. In the case of pose estimation, this looked like generating videos of 3D avatars in realistic scenes. Crafting parameterized, programmatically-generated scenes inside a video game engine requires a very different skill set from training models in the first place, which gave Infinity’s founders the insight that there was an opportunity for a startup designed to deliver that data as a service.
Today, Infinity’s datasets and data generators span fitness, robotics, and smart facilities. Their data marketplace is the world’s largest open-source collection of synthetic datasets (1M free frames and more added every month) and they’re working with over a dozen of the most forward-thinking customers in ML, including Tempo Fitness, Voxel Safety, SwRI, and multiple Fortune 500 companies. Infinity’s customers have generated over 5M synthetic data frames using their API in just 6 months. Most importantly, they’ve shipped production models in half the time and for a tenth of the cost!
On a personal note, out of the hundreds of startups I’ve met since joining Matrix as an early-stage investor last year, Infinity AI has been the one that most completely blew my mind. The power of synthetic data has long been understood in research contexts, but advances over the last few years have made it relevant in practical application. I learn so much every day about the state of the art from Infinity’s founding team, led by Lina Colucci and Sidney Primas. They’re hiring for technical artists and backend machine learning engineers to prove out the promise of synthetic data. If you’re excited by this, they’d love to hear from you—you can reach them at firstname.lastname@example.org. And if you’re building in applied AI, I’d love to hear from you—my inbox is always open at email@example.com.