Our Investment in Luel: The Marketplace for Multimodal AI Training Data

Luel Co-founders Inigo Lenderking and William Namgyal.

Today, Lightspeed is announcing that we’re leading Luel’s 31.2M financing to support their goal of powering frontier AI with human intelligence.

Founded by lifelong friends, William Namgyal and Inigo Lenderking met as competitive Fortnite duo partners. They won their first cash cup together. Years later they ended up as roommates at Berkeley, joined Y Combinator and dropped out to start Luel after thinking about the data space the last two years. We at Lightspeed are thrilled to support Luel on this next phase of their journey alongside other investors like General Catalyst, SV Angel, and other luminaries from the AI and data space.

Luel is a two-sided marketplace for rights-cleared, multimodal AI training data. The model only works if you can mobilize a global contributor network and run quality assurance at speed, and the founders had been running coordinated, low-latency teams against time pressure for years before they thought of it as a startup.

Today, we’re announcing our seed investment in Luel, and we’re glad to be backing William and Inigo as they build it.

The Data Wall Is Real

Every frontier AI model needs vast amounts of training data, and for most of the last decade, the answer was simple: scrape more of the internet. That approach built nearly every major language model in existence. It has hit its ceiling, and everyone is looking for the right way forward.

The next generation of models — voice agents, humanoid robotics systems, video generation models — need data the public web doesn’t have. Over the last few years, the models have exhausted much of the available data online for training, and as AI moves into the frontier, the bottleneck now shifts to massive, net-new human generated data to further train these models. German speech recorded inside a patient consultation. Egocentric video of a craftsman carving a gemstone. Urdu spoken in the acoustic conditions of a Pakistani street. Not the cinematic version, not the text description — the unstructured reality the models are expected to mimic.

That data has to be cleared for rights and delivered at a quality level scraped content can’t match. The existing infrastructure for supplying it wasn’t built to scale.

That’s the gap Luel is seeking to close, and why we’re backing them at seed.

Luel is a neutral, two-sided marketplace for rights-cleared, multimodal training data. Its customers — generative AI labs, robotics companies, speech research teams — spec the data they need. Luel mobilizes a global contributor network of 500K+ people across 96 countries to collect it, runs every submission through a proprietary quality assurance (QA) pipeline, and delivers audit-ready datasets.

Enter Luel

Here’s the structural mismatch we kept coming back to. The dominant data-labeling players today are built for expert annotation and RLHF workflows. Their contributor networks skew toward PhD-level specialists completing complex tasks at around $85/hour. We believe that’s the wrong profile for collecting high-volume, everyday multimodal data across dozens of languages and geographies.

A newer class of providers takes a different approach: hired teams producing custom datasets to spec. But each tends to be modality-specific — audio-only, or robotics-video-only — and every new dataset requires rebuilding the production line from scratch. None of them were designed for the long tail of niche data needs every major lab is now running into.

A Model Built for the New Demand

Luel’s model is built differently in a few ways that compound. Luel’s customers — generative AI labs, robotics companies, speech research teams — spec the data they need. Luel mobilizes a global contributor network of 500,000+ people across 96 countries to collect it, runs every submission through a proprietary QA pipeline, and delivers audit-ready datasets. Projects move from kickoff to delivery in weeks rather than months. Fast QA means fast contributor payouts, which keeps contributors active, which is what makes the next project faster than the last.

For embodied AI in particular — robotics, wearables, agents that act in the physical world — Luel captures more than video. Every clip ships with the physics layer: sensor streams aligned to each frame, device pose, hand-object interaction. Without that, an egocentric clip is a video. With it, it’s a recording of a body moving through the world. That’s what those models need to learn from.

Every clip also ships with its own consent and provenance records. Compliance is built in as a critical feature. We believe this is essential when buyers need to clear enterprise legal review before they can use the data.

Stack those advantages, and the economic shape changes. Most of this market is a one-off services business; Luel’s completed datasets can be re-licensed to other customers, building a catalog over time. The unit economics are different from the ground up.

The Team

We first met William and Inigo earlier this year and were immediately struck by this team as young, high-clockspeed founders who are relentless in executing against an inflecting market.

William knows what happens when training data is wrong because he’s been on the other side of it. Building ezML, a computer vision startup he built before Luel, he was a customer of the exact pipeline he’s now building: sourcing multimodal data, debugging model failures that turned out to be data quality issues upstream. His LLM security research at Northeastern’s PEACH Lab added the second angle: how model behavior degrades when the pipeline fails at the source.

Inigo was a machine learning researcher focused on human behavioral modeling. That work translates directly into the contributor incentive systems and QA infrastructure at the center of Luel’s platform. Building a contributor network isn’t only a logistics problem, it’s also a behavioral one, and Inigo had a framework for it before Luel existed.

Why We’re Backing Them

The data wall the labs are hitting is real. The voice agents, robotics systems, and video models being built today will be trained on data that has to be collected and licensed, not scraped and borrowed. The company that builds the infrastructure to do that at scale, across modalities and geographies, with a QA pipeline that can keep pace with demand, is — in our view — one of the more durable positions in the AI stack.

We’re glad to be backing William and Inigo as they build it.

If you’re an AI lab, robotics company, or speech research team in need of training data — or a contributor looking to participate — visit luel.ai.

The content here does not constitute an offer to sell or a solicitation of an offer to buy any securities or investment advisory services.

The views expressed are those of the authors and do not necessarily represent the views or opinions of Lightspeed. Other market participants could take different views. Unless otherwise indicated, the inclusion of any third-party firm and/or company names, brands and/or logos are for representational purposes and does not imply any affiliation with these firms or companies and also does not imply their endorsement of the views expressed by the authors.

Certain information contained herein is based on information from various sources prepared by third parties. While such sources are believed by Lightspeed to be reliable, neither Lightspeed nor its affiliates assume any responsibility for the accuracy or completeness of such information, and such information has not been independently verified by Lightspeed. For more details please see https://www.lsvp.com/legal.

Authors

Lisa Han

Raviraj Jain