Home News Matt Hocking, Co-Founder & CEO of WellSaid Labs – Interview Series

Matt Hocking, Co-Founder & CEO of WellSaid Labs – Interview Series

0
Matt Hocking, Co-Founder & CEO of WellSaid Labs – Interview Series

Matt Hocking is the co-founder and CEO of WellSaid Labs, a number one enterprise-grade AI Voice Generator. He has greater than 15 years of experience leading teams and delivering technology solutions at scale.

Your background is fairly entrepreneurial, how did you initially become involved in AI?

I suppose I’ve all the time considered myself pretty entrepreneurial. I began my first business out of faculty and with a background in product design, have found myself gravitating toward helping folks with early-stage ideas. Throughout my profession, I’ve been lucky enough to work with quite a few startups which have gone on to have some pretty incredible runs. During those experiences, I’ve had exposure to plenty of great founders first-hand, in turn inspiring me to pursue my very own ideas as a founder. AI was relatively recent to me after I joined AI2; nonetheless, that have provided me with a possibility to use my product and startup lens to some truly amazing research and picture how these recent advancements were going to have the opportunity to assist plenty of folks in the approaching years. My goal for the reason that starting has been to develop real businesses for real people, and I imagine AI has the potential to create plenty of exciting opportunities and efficiencies in our future if applied thoughtfully.

Could you share the story of how the thought for WellSaid Labs was conceived once you were an entrepreneur in residence at The Allen Institute for AI?

I joined The Allen Institute for Artificial Intelligence (AI2) as an Entrepreneur in Residence in 2018. Arguably probably the most modern incubator on the earth, AI2 houses the brightest minds in AI that apply solutions from the sting of what’s possible today to tangible products that solve problems across the globe. My background in design and technology nurtured a long-time interest within the creative fields, and with the AI boom we’re all witnessing today, I desired to explore a approach to connect the 2. I used to be introduced to Michael Petrochuk (WellSaid Labs co-founder and CTO) while developing an interactive healthcare app that guided the patient through various sensitive scenarios. Throughout the strategy of developing the content for the experience, my team worked with voice talent to pre-record 1000’s of lines of voiceover for the avatar. After I was exposed to among the breakthroughs Michael had achieved during his research, we each quickly saw the worth of how human-parity text-to-speech (TTS) could transform not only the product I used to be working on but additionally impact quite a few other applications and industries. Technology and tooling had struggled to maintain up with the needs of producers creating with voice as a medium. We saw a path to putting this technology within the hands of all creators, allowing voice to be an integral a part of all stories.

WellSaid Labs is certainly one of the few firms that gives voice actors with an avenue into the AI voiceover space. Why did you suspect it was necessary to integrate real voices into the product?

Our answer to that is two-pronged: first, we desired to create solutions that complimented skilled voice actors’ capabilities, expanding opportunities for voice. And second, we attempt to have the very best level of human quality in our products. Our voice actors are long-term collaborative partners and receive compensation and revenue share for each their voice data and the next content produced with it. Every voice actor we hire to create an AI voice avatar based on the likeness of their voice is paid based on how much their voice is used on our platform. We encourage talent to partner with us; fair compensation for his or her contributions is incredibly necessary to us.

To supply the very best level of human-quality products available on the market, we have to be rigorous about where we get our data. This process gives us more control over the standard, as we train our deep learning models to talk each to human parity and specific contextually relevant styles. We don’t just create a voice that recites the provided input. Our models offer a wide range of voice styles that perform what’s on the page. Whether users are creating voiceover through the use of an avatar from our library or creating voiceover with a custom-built voice for his or her brand, we use real voice data to make sure a seamless process and easy-to-use platform. If our customers had to govern and edit our voices in post-production, the strategy of getting the specified output can be clunky and long. Our voices take the context of the written content and supply a contextually accurate reading. We provide voices for every type of use cases –  whether it’s reading the news, making an audio ad, or automated call center support – so partnering with skilled voice talent specific for every use case provides us with each the context and high-quality voice data.

We recurrently update and add recent styles and accents to our avatar library to be certain that we represent the voices of our customers. In WellSaid Labs’ Studio, customers and types can audition different voices based on region, style, and use case, allowing for a more seamless, unified production of audio content personalized to the maker’s needs. Once an initial recording is sampled, users can cue specific words, spellings, and pronunciations to make sure the AI consistently speaks specifically to their needs.

WellSaid Labs is staking its claim as the primary ethical AI voice platform. Why are AI ethics necessary to you?

As AI adoption increases and becomes more mainstream, fears of harmful use cases and bad actors are at the middle of each conversation – and these concerns are unfortunately validated by real-world occurrences. AI voice is not any exception; nearly each day, a brand new report of a celeb, public figure or politician being deepfaked for advertisements or political purposes makes news headlines. Though formal federal regulation regarding this technology continues to be evolving, detecting and combating malicious actors and uses of synthetic voice will turn out to be increasingly difficult because the technology continues to advance.

Coming from AI2, where AI ethics is a core principle, Michael and I had these conversations on day one. Developing AI speech technology comes with significant responsibilities regarding consent, privacy, and overall safety. We all know that we, as developers, must construct our technology safely, address ethical concerns, and lay the groundwork for the longer term development of synthetic voices. We recognize the potential of AI speech technology for misuse and embrace our responsibility to scale back the potential misuse of our product. We want to put this foundation from day one slightly than run fast and make mistakes along the way in which. That wouldn’t be doing right by our enterprise customers and voice actors, who count on us to construct a high-quality, trustworthy product.

We fully support the decision for laws on this field; nonetheless, we is not going to wait for federal regulations to be enacted. We’ve got all the time prioritized and can proceed to prioritize practices that support privacy, security, transparency, and accountability.

We strictly abide by our company’s ethical code of intent, which relies on constructing with responsible innovation in every decision we make. That is in the very best interest of our global customers – enterprise brands.

How do you develop an ethical AI voice platform?

WellSaid Labs has been committed to moral innovation from the beginning. We centralize trust and transparency through using in-house data models, explicit consent requirements, our content moderation program, and our commitment to brand protection. At WellSaid, we lean on the principles of Responsible AI to shape our decisions and designs, and people principles extend to using our voices. Our code of ethics represents these principles as Accountability, Transparency, Privacy and Security, and Fairness.

Accountability: We maintain strict standards for appropriate content, prohibiting using our voices for content that’s harmful, hateful, fraudulent, or intended to incite violence. Our Trust & Safety team upholds these standards with a rigorous content moderation program, blocking and removing users who try to violate our Terms of Service.

Transparency: We require explicit consent before constructing an artificial voice with someone’s voice data. Users aren’t in a position to upload voice data from politicians, celebrities, or anyone else to create a clone of their voice unless now we have that person’s explicit, written consent.

Privacy and Security: We protect the identities of our voice actors through the use of stock images and aliases to represent the synthetic voices. We also encourage them to exercise caution about how and with whom they share their association with WellSaid Labs or other synthetic voice firms to scale back the chance for misuse of their voice.

Fairness: We compensate all voice actors who provide voice data for our platform, and we offer them with ongoing revenue share for using the synthetic voice we construct with their data.

Together with these principles, we also strictly respect mental property. We don’t claim ownership over the content provided by our users or voice actors. We prioritize integrity, fairness, and transparency in every part we do, ensuring that our synthetic speech technology is used responsibly and ethically. We actively seek partnerships with voices from diverse backgrounds and experiences to be certain that we offer a voice for everybody.

Our commitment to responsible innovation and developing AI voice technology with ethics in mind sets us other than others within the space who’re looking for to capitalize on a brand new, unregulated industry through any means. Our early investments in ethics, safety, and privacy establish trust and loyalty inside our voice actors and customers, who increasingly seek ethically-made services and products from the businesses on the forefront of innovation.

WellSaid Labs has created its own in-house AI model that enabled its AI voices to realize human parity, and it has achieved this by bringing the imperfections humans should conversations. What’s it about these imperfections that make the AI higher, and the way are these imperfections implemented?

WellSaid Labs isn’t just one other TTS generator. Where early TTS technology was unable to acknowledge human speech qualities like pitch, tone, and dialect that convey the context and emotion behind the words, WellSaid voices have achieved human parity, bringing uniquely human imperfections to AI-generated speech.

Our primary measure of voice quality is and has all the time been human naturalness. This guiding belief has shaped our technology at every stage, from the script libraries we’ve built to the instructions we give talent and, more recently, how we iterate on our core TTS algorithms.

We train on authentic human vocalizations. Our voice talent reads their scripts authentically and engagingly once they record for us. Speech perfection, however, is a mechanical concept that results in a robotically flawless, unnatural output. When skilled voice talent performs, their rate of speech fluctuates. Their loudness moves along with the content they’re reading. Their vocal pitch may rise in a passage requiring an excited read and fall again in a more somber line. These dynamic variations make up an enticing human vocal performance.

By constructing AI processes that work in coordination with the dynamic performances of our skilled talent, now we have built a really natural TTS platform. We developed the primary long-form TTS system with predictive controls throughout all the creative process. Our phonetic library holds a various collection of audio data, allowing users to include specific vocal cues, like pronunciation guidance or controllability, into the model throughout the production phase. In a single platform, WellSaid users can record, edit, and stylize their voiceover while not having to import external data.

Could you discuss among the challenges behind constructing a text-to-speech (TTS) AI company?

The event of AI voice technology has created a wholly recent set of obstacles for each its producers and consumers. Considered one of the essential challenges just isn’t getting caught up within the noise and hype that floods the AI sector. As a brand new, buzzy technology, many organizations try to money in on short-term AI voiceover developments. We wish to offer a voice for everybody, guided by central ethical principles and authenticity. This adherence to authenticity can delay the event and deployment of our technologies but solidifies the security and security of WellSaid voices and their data.

One other challenge of developing our TTS platform was developing specific consent guidelines to be certain that organizations or individual actors won’t misuse our technology. To combat this challenge, we hunt down collaborative, long-term partnerships and are fully involved with voiceover development to extend accountability, transparency, and user security. We actively seek partnerships with voice talent from various backgrounds, organizations, and experiences to be certain that WellSaid Labs’ library of voices reflects its creators and audiences. These processes are designed to be intentional and detail-oriented to make sure our technology is getting used as safely and ethically as possible, which may slow the event and launch timeline.

What’s your vision for the longer term of generative AI voices?

For the longest time, AI speech technology has not reached high enough quality to enable firms to create meaningful content at scale. Now that audio technology not requires expensive equipment and hardware, all written content could be produced and published in an audio format to create engaging, multi-modal experiences.

Today, AI voices can produce human-like audio and capture the nuance required to make digital storytelling more accessible and natural. The long run of generative AI voice might be all-encompassing audible experiences that touch every aspect of our lives. As technology continues to advance, we’ll see increasingly natural and expressive synthetic voices blur the road between human and machine-generated speech – opening recent doors for business, communications, accessibility, and the way we interact with the world around us.

Businesses will find enhanced personalization in AI voice interfaces and use them to make interactions with virtual assistants more immersive and user-friendly. These enhancements are happening already, from intelligent call center agents to fast-food drive-thrus. Content creation, including promoting, product marketing, news narration, podcasts, audiobooks, and other multimedia, will see increased efficiency through the use of tools to develop engaging content – ultimately increasing lift and revenue for organizations, especially now that multilingual models can expand an organization’s reach from a single point of origin to having a world presence. Production teams will find great profit in synthetic voices to create voices tailor-made to the brand’s needs or customized to the listener.

Before the introduction of AI, TTS technology lacked the crucial human emotion, intonation, and pronunciation abilities required to inform a full story at scale and with ease. Now, AI-powered TTS offers more immersive and accessible experiences, including real-time speech capabilities and interactive conversational agents.

Achieving human-like speech capabilities has been a journey, but now that it’s attainable, we’re witnessing the entire scope of AI voice to create real business value for organizations.

LEAVE A REPLY

Please enter your comment!
Please enter your name here