SpeakerText, founded by Matt Mireles in 2008, was a startup that developed technology to transcribe online videos using a hybrid approach of speech recognition software and crowdsourced human labor, eventually evolving into the Humanoid workforce management platform before being acquired by CloudFactory in October 2012.
The company's core offering was an interactive video transcription service that made video content searchable, accessible, and SEO-friendly. Using a combination of proprietary speech recognition, natural language processing, and crowdsourced human labor, SpeakerText created accurate transcripts that integrated with various video platforms including YouTube, Brightcove, Ooyala, and JW Player12. Their signature feature was the SpeakerBar plugin, which appeared beneath transcribed videos and allowed viewers to click on any word in the transcript to jump directly to that moment in the video3.
Initially, the service relied heavily on Amazon Mechanical Turk for crowdsourcing transcription work, but quality issues led to significant inefficiencies—for every dollar spent on Mechanical Turk, SpeakerText reportedly spent two more fixing errors. The company's pricing model started at $19.99 per month plus per-minute transcription fees, targeting clients ranging from individual bloggers to major media companies.
Facing quality control challenges with crowdsourced transcription, Mireles and his team developed Humanoid, an advanced workforce management platform that revolutionized how SpeakerText operated. This sophisticated system featured a reputation algorithm that optimized task assignment among workers by considering factors like completion rates, work history, and time of day.12 Essentially functioning as a "robot boss," Humanoid could intelligently oversee and reassign tasks as needed to maximize efficiency and quality.
As Humanoid matured, it became the cornerstone of SpeakerText's operations, with the original transcription service eventually becoming just one of several products under the Humanoid umbrella. This strategic pivot shifted the company's focus toward building scalable, machine learning-driven solutions for distributed workforces, positioning Humanoid as the core technology that would ultimately attract acquisition interest.34
SpeakerText secured $600,000 in seed funding in February 2011, led by prominent angel investor Mitch Kapor, known for his work in promoting equity in tech startups.12 The company later attracted additional investment from Google Ventures, which reportedly contributed "a few hundred thousand dollars" to a subsequent funding round in 2012.3 This diverse investor pool included notable angel investors such as Roy Rodenstein, Lukas Biewald, Adam Schwartz, and Georges Harik.4
Despite the promising technology and investor confidence, SpeakerText faced significant financial challenges throughout its existence. According to founder Matt Mireles, the company lost money for 18 consecutive months,5 highlighting the difficult economics of building a sustainable business model around their hybrid human-AI transcription service. These financial struggles likely contributed to the company's eventual decision to be acquired by CloudFactory, which purchased both SpeakerText and the Humanoid platform in October 2012.67
In October 2012, CloudFactory acquired both SpeakerText and its Humanoid platform for an undisclosed amount.12 This strategic acquisition allowed CloudFactory, a distributed workforce company focused on business automation, to integrate SpeakerText's transcription technology into its service offerings. Following the acquisition, CloudFactory launched its CloudFactory 2.0 platform in March 2013, which incorporated the acquired technologies to improve accuracy, speed, and scalability of its services.2
The acquisition aligned with CloudFactory's mission of connecting people in developing countries to digital work opportunities. SpeakerText's technology became a key component in CloudFactory's suite of services, which included PaperText and ImageData, helping the company serve clients like ESPN and Ooyala.13 This integration supported CloudFactory's ambitious goal of creating one million "cloudworker" jobs in developing economies, with the transcription technology providing significant opportunities for their growing workforce in Kenya.3
SpeakerText's innovative approach to combining human labor with machine learning algorithms pioneered concepts that would later influence major AI companies. The company's hybrid transcription model can be seen as a precursor to modern platforms like Descript, which uses AI for automated transcription while maintaining human-in-the-loop capabilities for quality assurance. SpeakerText's development of the Humanoid platform—with its sophisticated task assignment algorithms and quality control mechanisms—established early frameworks for what would eventually evolve into more advanced AI-assisted workflow systems.
The company's legacy extends beyond transcription technology to the broader field of human-AI collaboration. By developing systems that optimized human workforce management through algorithmic decision-making, SpeakerText anticipated the emergence of companies like Scale.ai, which similarly leverages distributed human workforces for AI training and data labeling. This approach to creating structured workflows for complex tasks requiring both machine efficiency and human judgment has become fundamental to modern machine learning operations, particularly in areas requiring high-quality training data and validation processes. The optimization methods pioneered by SpeakerText for managing distributed workforces continue to influence how AI companies approach the challenge of combining human expertise with algorithmic efficiency12.