Scaling AI: Why Lightspeed is co-leading Foundry’s Series A round

In the world of AI, the GPU is king. Foundry makes this scarce computing resource more widely available at a lower cost and greater reliability.

Pictured left to right: Michael Mignano (Partner, Lightspeed) and Jared Quincy Davis (Founder & CEO, Foundry) at Lightspeed’s Generative NYC event last fall.

The generative AI revolution has barely gotten underway, yet it’s already changed how the world does business. But one thing that largely hasn’t changed is how enterprises provision the computing resources needed to enable gen AI.

Training the large language models (LLMs) that form the foundation of gen AI is a computationally intensive process, requiring banks of powerful graphics processing units (GPUs). Because of ongoing chip shortages, however, enterprises are forced to reserve GPU capacity from hyperscale cloud providers for years at a time, sometimes at a cost of hundreds of millions of dollars. For technologically advanced companies, the cost of computing power is now their largest operating expense, more than human resources.

It doesn’t have to be this way. Instead of paying exorbitant amounts for, and poorly leveraging, scarce high-end compute resources, why not fully leverage the 10 million+ GPUs that are already in the field—most of which are severely underutilized?

That’s a founding principle of Foundry, a startup launched by veterans of Google DeepMind, Stanford, and others of the world’s top machine learning research centers. It’s a game-changing idea arriving at precisely the right moment in our current AI trajectory, when most organizations are still mapping out their gen AI strategies. And it’s poised to slash compute costs radically and immediately.

It’s an idea we found both extremely compelling and incredibly timely, two key reasons why Lightspeed is proud to be co-leading Foundry’s Series A round along with our friends at Sequoia.

A radically novel approach to AI cloud computing

AWS was not built for the AI revolution (it’s called Amazon Web Services afterall). AI compute is fundamentally distinct from prior classes of work done in the cloud. While re-building cloud infrastructure software end-to-end with AI workloads in mind is a simple concept, manifesting this at scale is incredibly complex.

An element of Foundry’s core technology works like a massive air-traffic control layer, orchestrating workloads between GPUs of varying capacities depending on the amount of processing power required by each task. This allows much more efficient asset utilization — not every machine learning task requires the powerful Nvidia H100 chips — and enables organizations to scale their consumption as needed. That in part makes it possible for Foundry to deliver GPU and AI accelerator compute services for less than half of what their customers could access otherwise via hyperscale cloud providers or even budget GPU resellers.

Other key elements of Foundry’s IP allow it to manage communications between disparate GPU nodes in a low-latency way, while also ensuring the highest levels of security for sensitive data. Foundry has also built to address the availability, resiliency, and node failure challenges endemic for practitioners pushing the limits of scale and running applications in production. Foundry is democratizing and eclipsing the capabilities previously exclusive to top-tier industry labs like DeepMind and OpenAI, who’ve invested for years to build their core internal infrastructure.

A team of elite AI scientists

Foundry’s founding team ranks among the best in the industry. Very few people spend their time thinking deeply about datacenter infrastructure and economics, congestion control, and cloud resource packing and scheduling – the core elements that undergird the public cloud; even fewer of those people are top-tier experts in both large-scale deep learning and market design.

Before Jared Quincy Davis founded Foundry, he was a research scientist on Google DeepMind’s 50-person Core Deep Learning Team, where he worked on the ML-theoretic and distributed systems challenges core to scaling deep learning and rendering it useful for complex applied problems like data center design and packing, industrial HVAC, robotic navigation and control, and more. A recipient of top awards in ML/STEM (Open Phil AI Fellowship, Hertz) and a pioneer of ground-breaking methods for scalable deep learning, Jared is one of the most brilliant minds in all of AI right now, while also possessing outlier business acumen.

He’s joined by a stellar team of former AI research scientists and infrastructure engineers from the University of California at Berkeley, Stanford, Meta, OpenAI, and beyond.

Angels and advisors to the team include Jeff Dean (Chief Scientist at Google DeepMind), Ion Stoica (Executive chairman at Anyscale, CS professor at UC Berkeley), and Matei Zaharia (Databricks CTO, CS professor at UC Berkeley).

Even in the rarified world of Silicon Valley, to have this much engineering and AI talent in one early-stage startup is exceedingly rare.

An enormous market opportunity

The need for more price-performant GPU power will only continue to rise as AI scales up—and compute demands for model training become dwarfed by needs of AI actively making predictions and inferences in near-real time. Beyond lowering costs to scale AI, Foundry’s model also offers a distinct advantage over classic cloud providers; AI-powered edge devices can simply seek out the nearest GPU among Foundry’s highly distributed infrastructure, reducing latency significantly compared to existing centralized providers.

Make no mistake: Foundry is a long-term play that will ultimately take a significant bite out of the extremely fast growing $270 billion cloud computing business dominated by the Big Three.

We’re honored to join Foundry on its mission to make AI more accessible and affordable to organizations around the world.

Authors

Raviraj Jain

Ravi Mhatre