At the latest #GenNYC event, Lightspeed chats with the founders of Pika and ElevenLabs about the future of AI audio and video and what it's like to compete against incumbents.
The past 18 months or so have been an absolute rollercoaster for generative AI. New breakthroughs nearly every day, tons of startups launching, lots of funding happening–it’s been equal parts exciting and exhausting.
In that time, we’ve hosted more than a dozen meetups, from San Francisco and Los Angeles to Paris and New York, where established and aspiring founders in the gen AI space could gather, connect, and learn from each other. We also launched a podcast called Generative Now (because, of course, we did).
On a rainy evening this summer, we convened again in New York, where I had the privilege of hosting leaders from two of the most exciting startups in the market today: Demi Guo, Co-Founder and CEO of Pika, which makes video creation accessible to everyone, and Mati Staniszewski, Co-Founder and CEO of ElevenLabs, a generative voice platform making audio content universally accessible in any language and in any voice.
In the short time they’ve been around, Pika and ElevenLabs have shipped products that have delighted millions of users, trained some of the most exciting models in their categories, and raised more than $100 million each in venture capital.
I chatted with Demi and Mati about what it takes to build a great AI model, how they manage to hold their own against the 1000-pound gorillas in the gen AI space, and the best ways to raise capital to scale their companies.
Here are some of the highlights of our conversation, edited for clarity and brevity. You can find the full recording here.
What’s the secret to building a really great AI model?
Demi: It really starts with the quality of our research engineering team. Our founding team members came from top AI industry labs like Google DeepMind and Facebook AI and schools like Stanford and MIT. But it’s not just about technology skills–art is just as important as science in building a model. That’s why a lot of our team members have backgrounds in film, art, and music, bringing a creative lens to the model-building process.
Mati: I agree 100 percent with what Demi said. It’s crucial to assemble an incredible team of researchers, and it’s also important to focus that team on a very specific set of goals. In our case, that’s audio. There’s always the temptation to expand into video or text, but we’re trying to stay true to the things we do best and explore different use cases for audio, such as more efficient ways to produce audiobooks or handle short-form media.
How can small startups like Pika and ElevenLabs compete against massive, heavily funded incumbents?
Demi: We’re primarily interested in building a great model. Our advantage is that we can focus on the quality of the experience and offer more user controls without having to balance all the other priorities and demands a company like Google or OpenAI might have. We’re also better at building smaller, more efficient models (lower costs and faster production), which gives us an advantage in pricing.
Mati: The great advantage we have over incumbents is the quality of our models. We’re betting we can build the best audio AI tool in the world. But aside from just an API, we’re also providing a platform with a plethora of different audio tools and workflows to derive value from. There are a number of challenges but a few key ones are making sure we continue innovating on a research level, continue building the right set of workflows (and not too many) and finally find smart ways to distribute our work to the world, so people know it delivers value and is worth paying more for. We are keen to align ourselves and prove the value to a lot of customers and why we offer a free three-month trial for interested customers, making it easy for them to try it out.
What are some of the new use cases you’re exploring for gen AI?
Demi: Pika’s fundamental purpose is to make AI video creation more accessible. I think we’ve done a good job of that. Now we’re hoping to enable the creation of videos that are basically just beautiful. We’re currently incubating a new platform and exploring new use cases for it, likely more consumer-oriented than professional, but that’s all I can reveal for now. Stay tuned.
Mati: We’ve actually collaborated with Pika to combine their video with our audio, which leads to a much more immersive and appealing experience. How incredible would it be to listen to an audiobook but have your favorite characters and voices appear in a video clip? I think it would be amazing. We’re also working with enterprise customers across customer service, language learning or healthcare space, where, as an example, we help automate some of the calls doctors and nurses frequently don’t have time to make, like reminding patients to take their medicine or checking in on how they’re feeling.
You’ve both been very successful raising investment capital. Got any tips on how others can do it too?
Demi: We don’t do fundraising for the sake of fundraising. We think about the next milestone we need to hit and how much we need to raise in order to reach that milestone and move the company forward. Showing steady progress is the best way to get the attention of investors.
Mati: I think founders should spend as little time on fundraising as possible. But when they do it, they should do it all at once, over a period of a few weeks. That’s not only a more efficient use of time, but it might also allow you to have investors work with you at the same time and give more apple to apple comparison of value they bring and potential investment terms. And, of course, don’t try to raise more than you think you’ll need over the next couple of years. I don’t think it’ll make you lazy, but it’s not worth giving away equity–save that for employees and other people in the journey.
Authors