AI has been the centerpiece of conversation around technology as we’ve entered a new year. There’s lots of buzz and hype around the space, and so we decided to sit down with a founder who’s been at it for a while to get his take on where the industry is headed.
Dylan Fox is the founder and CEO of a company called AssemblyAI. His company is focused on audio specifically, helping companies leverage models for speech recognition, detection, summarization, and more. I was impressed to learn that consumer apps like Shopify have already turned to their API for bolstering their AI capabilities. They’ve shown a lot of momentum – in the past year the company raised two funding rounds, its Series A and Series B, for a total of $63.1M in funding.
In the interview, I dug into how companies are using his API today, how he sees AI impacting our economy, and where AssemblyAI is headed next.
Gary Drenik: Tell me about your background and the journey that led you to starting AssemblyAI.
Dylan Fox: I’m a self-taught software engineer and have always loved learning about new technologies. In one of my past jobs, I had the opportunity to focus on Machine Learning and AI technologies and immediately saw how transformational this technology was going to become.
As I became more and more interested in AI research, I noticed there was a lot of work being done in the field of speech recognition and how quickly the research was improving. However, all of the companies offering speech recognition as a service were insanely antiquated, hard to buy anything from, and were running outdated AI tech. It was a combination of these factors that inspired me to think, “What if you could build a Twilio-style API company that was much easier for developers to access state-of-the-art AI models for various tasks, starting with speech recognition.” And it was from there that the idea for AssemblyAI grew.
At AssemblyAI, we build on the latest AI breakthroughs to offer state-of-the-art, production-ready AI models for developers, startups, and global enterprises — all through a simple API. Right now, we’re focused on AI models for understanding speech, but in the future we plan to release AI models for other modalities and tasks.
Drenik: There’s been a lot of talk about generative AI over the past few months. Can you talk about how this new buzzy technology will impact everyday consumers specifically and when you expect it to happen?
Fox: Generative AI technology has actually been around for several years. GPT-2, the predecessor to GPT-3 and ChatGPT by OpenAI, was actually released in 2019. And a number of research labs have been working on AI models for generating art for several years.
In 2022, however, these models took huge leaps forward as most of us saw with programs like ChatGPT, MidJourney, and Stable Diffusion. These technologies are still in their infancy, but already, today, have the potential to transform a number of businesses and industries.
For example, companies like Jasper.ai are leveraging generative AI to make marketers 10x more efficient at copywriting. RunwayML is leveraging generative AI to produce incredible AI-powered tools for video creation and editing.
Consumers can expect to see AI powered experiences pop up in most apps they interact with over the next year. A recent Prosper Insights & Analytics found that 37.3% of Gen-Z actually prefer to chat with AI over a real person for services, so the demand is already there from the consumer side. We see these tailwinds early at AssemblyAI, where signups to our API have increased 5x year-over-year. Troves of developers are coming to our company in particular in order to get access to AI models that can understand audio and speech, and building a number of exciting new applications around the models we make available.
Prosper – Communicate with AI Chat Program
Prosper Insights & Analytics
Drenik: I know AssemblyAI works with some large consumer apps like Spotify. Can you talk about how you work with consumer products and what capabilities within their products your API makes possible?
Fox: We’re seeing customers add powerful features to their apps for use cases including search, content recommendation, and content moderation. For example, leveraging our API, some of our customers are indexing petabytes of audio data – everything from virtual meeting recordings to podcasts – to make it searchable within their apps. We’re also seeing customers detect the topics spoken in audio files, and using that metadata to power more relevant content recommendations and advertising targeting.
On the content moderation front, we’re helping customers automatically moderate troves of user-generated audio and video content with our AI models that can detect spoken hate speech, violence, and a number of other sensitive topics.
There’s a lot of innovation going on right now and, in general, we are helping product teams across different industries rapidly introduce new AI capabilities into their apps through our API.
Today 71.7% of Gen-Z use Spotify according to Prosper Insights & Analytics, and so we’ll see over time the ways that our product will be working behind the scenes in a way that can touch everyday consumers without them even realizing it.
Prosper – How Often Do You Use Spotify?
Prosper Insights & Analytics
Drenik: There’s a lot of economic uncertainty right now, and AI has the potential to fuel this further by eliminating jobs. As the founder of a company in this space, what’s your take on AI’s potential impact on our labor market?
Fox: This new wave of generative AI models will likely automate away a number of jobs and industries, but it has the potential to increase overall productivity, efficiency, and creativity — as well as create new job opportunities. This is really the same story for every major technological breakthrough. Some jobs and industries go away, but new ones will be born.
I think in the near term, we’ll begin to see a number of creative industries and professionals begin to work alongside AI in order to increase their overall productivity and creativity.
For example, it’s unlikely artists will be replaced overnight by AI models. Instead, artists will begin to leverage AI models to generate new concepts and ideas much faster. Marketers, for example, will begin to leverage AI models to iterate on blog posts and press releases much faster.
These AI models are incredible, but they do require some human supervision still.
Drenik: What’s next for AssemblyAI? What areas do you want to focus on in terms of growing the business and expanding the product capabilities?
Fox: In a few weeks, we’ll be releasing our largest and most accurate speech recognition model to-date.
This new model has been trained on millions of hours of audio data and will be the largest trained commercial speech recognition model available today. Built on top of the latest research into Transformers and data scaling, leveraging insight from breakthroughs such as DeepMind’s Chinchilla paper, it achieves human level transcription accuracy on most datasets, and surpasses current state-of-the-art on virtually all benchmarks.
We’re excited to get this model into the hands of developers and businesses, so they can continue to build more powerful AI-powered features around audio. We also have a number of new AI models for speech understanding, including an exciting model for few-shot topic classification, launching early this year. Stay tuned!
Drenik: Thanks for taking the time to discuss AI technology innovations and the unique role AssemblyAI is playing in this space. I look forward to seeing how the company continues to grow and support its customers over time.