Our CEO, Alexei Dunayev, recently sat down with the team at Propel(x) to talk about how TranscribeMe‘s powerful speech technology is helping to move the transcription world forward. Check out highlights from the interview below, and read the rest on the Propel(x) blog.

Propel(x): In your own words, tell us what TranscribeMe does.

TranscribeMe!: TranscribeMe converts voice-to-text with very high accuracy at a very low cost. Our customers are typically enterprise users such as academic institutions, media companies, legal and medical organizations, and contact centers. As a result, our customers have large quantities of audio and video content that they need converted into text.

We do this by operating a platform that combines artificial intelligence speech recognition algorithms with real human, crowd-sourced transcribers. And we’ve got a very large pool of crowd workers. Over 300,000 in total. These are people that can type audio-to-text and can correct the output of speech recognition, ultimately resulting in the perfect quality that goes to our customers.

We’ve been able to use the output generated by the human workers to train the artificial intelligence algorithms that power our system. In other words, we can continuously improve the quality of speech recognition technology by giving and verifying examples. And that’s the deep learning type application in our model. We do a lot of machine learning, and this is done by having humans who provide output and training for the computers.

Propel(x): Could you tell me a little bit about the size of your company? How many people do you have on your management team?

TranscribeMe!: Overall in the company we’ve got about 25 people. There’s five people on the management team, so we’re quite a small firm. We’re a startup in every sense of the word.

Propel(x): Can you explain to me how this technology is a leap vis-a-vis the current state of the industry?

TranscribeMe!: It’s interesting to think about it, because most other things working on artificial intelligence typically focus on the algorithm component rather than the training component. The training component, as well as the models that are generated by using student verified data, provides the largest boost in the output quality. And so, we’re very different in the sense that we’ve actually built our entire worker infrastructure for our crowd engagement. We can have hundreds of thousands of people contribute, which works towards training artificial intelligence.

Propel(x): So, 10 years down the line, TranscribeMe is a huge success. How has society been impacted?

TranscribeMe!: Very positively, I hope. Our goal is to help computers understand the human voice, not just very short commands. This will allow people to interact in the most natural way possible with computers. This is a very ambitious task that I hope impacts society positively. It certainly opens up a lot of avenues for technologies that are yet to come.

It’s important for computers to be able to understand natural human voice, so that we don’t have to modulate what we say, where we can speak with an accent, where we can truly express ourselves in the manner that’s apparent to even an infant. Voice expression is a fundamental part of human expression, and having computers understand human voice is very important.

