News Articles
Spoken Word Search Analyzes Audio Content
Searching for the Right Words
Innumerable audio files available on the Web make searching based on keywords or metadata tricky. But spoken-word search technology is making it easier for enterprises to find the right audio content online.
The popularity of podcasts and online audio is growing by leaps and bounds. In fact, 7.7 million people will be listening to podcasts weekly by 2010, versus an estimated 1.5 million listeners in 2006, according to Bridge Ratings, a provider of radio-audience trend information. With an increase in the quantity of available audio content, however, comes the question of how to find relevant content within audio files.
To address this, a few companies have developed a technology that applies spoken-word search techniques to audio content, as well as to the audio portion of video files. TVEyes' Podscope search engine, for instance, ferrets out audio content on the Web, then uses speech-recognition algorithms on that content, generating an index that can be searched by consumers and business users alike.
Although the use of spoken-word search is bound to make inroads into the consumer space first, it holds potential for use in the enterprise as well, just as instant messaging, blogging and wikis have. Although three big search vendors, Google, MSN and Yahoo, have yet to take outwardly visible action surrounding spoken-word technology, AOL has partnered with TVEyes, and launched a beta of TVEyes' search engine on its site this summer. We expect momentum around spoken-word search to continue to build, whether based on TVEyes' technology or a competing one.
One big potential enterprise benefit of spoken-word search is that the content creators--those users producing the podcasts--could bypass the metadata creation and manual transcription of audio files, which have been the conventions followed by companies requiring text-searchable audio files. So the technology represents a significant advancement for a niche enterprise need. It should infiltrate the menu of standard Web searches over a relatively short period. TVEyes thinks this will happen within 18 to 24 months, and we concur.
Search Techniques
Searching multimedia content today is primarily done with the equivalent of 1970s technology. The search requires keywords or metadata, or it relies on extrapolating information from a Web page. If a topic or phrase is mentioned in a podcast but doesn't appear on an associated Web page or within metadata, a standard search on that topic will not produce the podcast as a result.
With the emerging spoken-word search, the audio portion of a multimedia file is "listened to" by the search engine. An index--not necessarily a word-for-word transcript--is built by converting the spoken words to text using one or more voice-recognition algorithms. TVEyes uses at least eight engines; the algorithms can look at vocal inflection and signatures to guess an unclear phrase. The resulting index is text-searchable data. Unlike conventional voice-recognition software for PCs and telephone systems, these spoken-word search engines don't attempt to learn speech patterns and require an extensive library of words, phrases and accents. Background noise and music are ignored, though overlap between these sounds and spoken words will reduce accuracy.
Searching of video files is possible with a search engine such as Podscope, but it can search only the audio portion of video files. True video search--that is, using face- and body-recognition techniques--is still an experimental technology and is not ready for widespread consumer use. And there are audio search engines that can find music based on a few notes or that can locate a snippet of audio within a collection of sound files, but these do not perform voice recognition.
Enterprise Value
For consumers and enterprise users, spoken-word search promises not just to find an audio file containing relevant content, but also to pinpoint the location of the relevant content within the audio file using a time index. This means that, after finding the correct audio file to listen to, a user could skip ahead to the appropriate portion of the audio file.
Besides the added functionality that accurate search of multimedia content brings to both consumers and enterprise users, spoken-word search could ease the burden placed on enterprise users creating multimedia content. Without spoken-word search, creating the necessary search data is no small challenge for content creators, who rely on metadata or text transcripts to make the content searchable.
For businesses, there is another considerable benefit of spoken-word search for external facing content: As the use of spoken-word search increases, potential customers will be more likely to find podcasts and videocasts--which should decrease a business' dependence on sheer popularity or marketing efforts to draw users to a site.
An enterprise considering spoken-word search would not need to modify existing audio files. However, the technology isn't perfect. Audio content is more easily indexed if it has a clear, attentive speaker with little background noise and no background music. Higher-quality recordings and encodings produce better search results. TVEyes claims to have an average accuracy rate of 80 percent. Higher accuracy can be achieved through improvements in speech-recognition engines, but progress in that field takes years.
Michael J. DeMaria is an associate technology editor based at Network Computing's Syracuse University's Real-World Labs®. Write to him at mdemaria@nwc.com.
Click to read the full article.
|