It’s a consensus view that AI is bound to transform every industry in the world and drive significant improvements in productivity. The total spend on enterprise AI is growing at a rapid pace and, according to IDC, will cross $500B by 2024, driven by the recent advancements in natural language processing (NLP) and computer vision (CV).
But while there is agreement on the true north for AI, the path towards AI adoption in enterprises suffers from several issues. One of the major bottlenecks is the cost and time needed to label and manage the “training datasets” that modern AI approaches learn from. As ML models become increasingly more sophisticated, they require massive labeled training data sets. Creating and curating these data sets is often an expensive task for organizations, as it involves huge amounts of person-hours where data is being manually labeled — 80 percent of AI development time is spent on gathering, organizing, and labeling data manually which is used to train machine learning models. The cost and time needed with hand-labeling also makes it hard for development teams to build, iterate, adapt or audit applications in a systematic and privacy-compliant manner. The training data bottleneck has made AI application development an impractical endeavor, and as a result 87 percent of data science projects never make it into production.
Enter Snorkel AI. Started at the Stanford AI Lab back in 2015, Snorkel solves this problem of labeling and managing massive amounts of training datasets by introducing a new data programming paradigm where subject matter experts, data scientists, machine learning engineers use code or push-button UI to programmatically label data en masse. Snorkel’s technology has already seen rapid adoption by large enterprises like Google, YouTube, Facebook, LinkedIn, Microsoft, Intel, IBM, DARPA, and more. Encouraged by the ROI these early partners got from the Snorkel technology, but realizing how much custom build and development work was still needed each time to deploy it, the core team behind the project — Alex Ratner, Chris Ré, Paroma Varma, Braden Hancock, and Henry Ehrenberg founded Snorkel AI in 2019. There, they launched Snorkel Flow, the first AI platform that enables enterprises to build, iterate, deploy, and adapt AI applications powered by Snorkel’s programmatic labeling technology. Today, 2 out of top 3 US banks, several government agencies, and some of the world’s largest enterprises in telecommunications, biotech, and insurance, have developed AI applications 10–100x faster and deployed use cases previously impractical to solve using Snorkel Flow
At Lightspeed, we got to know the Snorkel team when it was still just a research project at Stanford University and we invested in their seed round several years later when they started the company. After tracking the company’s progress, seeing the scale of impact early customers were realizing with Snorkel Flow, and observing the massive market interest in the product, we are excited to announce that Lightspeed recently led the Series B in Snorkel AI. My partner Ravi Mhatre and I are honored to join the board.
In addition to the financing, the company also announced the launch of Snorkel Application Studio today, a visual builder with templated solutions for common AI use cases which further accelerates the model building process. It also provides enterprise-grade features like collaboration, auditability, and data privacy amongst others, making Snorkel the essential platform for any enterprise that is serious about taking advantage of AI.
Lightspeed is excited to deepen our partnership with Snorkel AI, and we’re looking forward to continuing to work closely with Alex and team, as well as our friends from Greylock, as Snorkel enters its next phase of growth.
— Raviraj Jain and Ravi Mhatre
Authors