Voice recognition is an important part of the development of advanced applications. By using this technology you can convert voice to text and vice versa. These two features significantly improve human-computer interaction. Voice recognition is widely used in personal assistant applications and software for the visually impaired.
Why Use Voice Recognition in PHP?
Voice recognition or speech recognition technology is not easy to develop. In-house development would be very costly, and buying proprietary libraries that convert voice to text would make the developed application very expensive for end users. On the other hand, when developing in PHP, you have many open-source solutions available. The same dilemma happens with voice recognition in PHP — there is an open-source class that can be used in voice applications. That way your application would cost less, and you also can offer it for free.
The second advantage of PHP is that you can easily create an application that uses text-to-speech or speech-to-text technology in Web environment. Also, the implementation of a PHP application in Web interface is cheaper since the voice recognition class is free, the database server is free and Web hosting is cheaper.
How Does It Work?
Web applications that need to convert voice to speech or vice versa are built in markup languages for voice recognition. Some commonly used markup languages are:
- SSML (Speech Synthesis Markup Language) — used for speech synthesis i.e. creates artificial speech based on input text
- SRGS (Speech Recognition Grammar Specification) — handles expected voice input patterns
- CCXML (Call Control eXtensible Markup Language) — used for controlling telephone applications
- VoiceXML (Voice Extensible Markup Language) — also converts text to speech
In order to be able to interpret the files created in one of these languages, you will need a voice browser. A voice browser is similar to a Web browser — the only difference is that it interprets voice tags instead of HTML tags. A voice browser usually provides an interface for communication with hardware devices, such as PSTN and PBX, which are used for applications that use a telephone line, such as customer-support systems or emergency notifications. Regarding the voice browser software, there is an open-source VoiceXML interpreter — OpenVXI. OpenVXI is available for both Windows and Linux operating systems. It requires the installation of a few libraries before installing the OpenVXI itself.
Installation requirements and other documentation can be found on the following website.
The class I will use in this article is PHP voice (it was called PHP VXML earlier). Actually, PHP voice is a package that is comprised of four classes that are used for creating voice applications in PHP:
- SSML 1.0
- SRGS 1.0
- CCXML 1.0
- VoiceXML 2.0
Each class is a helper class for one markup language and allows you to create markup language files dynamically.
Creating Your First Text-to-Speech Application
Let’s create a “Hello world” application that converts text to voice. Look at the following code first:
<?php include ("../vxml.class.php"); $app = new gonx_vxml; $app->start_vxml('','','en','','','2.0'); $app->load("message", array("","Hello World !","","hello.wav")); $app->end_vxml(); $app->generate(); ?>
First, we initialize a PHP voice recognition class. Speech synthesis is created in VoiceXML, so we use a VoiceXML helper class. Start_vxml and end_vxml add open and close VoiceXML tags. The function that opens VoiceXML tag has the following arguments (in the same order):
application xml:base xml:lang xmlns xmlns:xsi xsi:schemaLocation VoiceXML version
The last function (generate) echoes the VoiceXML code. The most important command is “load.” The first argument is the module name (in this case message, as we want to convert a single text message to voice), and the second argument is the parameter array.
VoiceXML can also be used for creating telephone menus and receiving user inputs:
<?php include ("../vxml.class.php"); $app = new gonx_vxml(); $app->start_vxml("", "", "", "", "", "2.0"); $app->start_catch("connection.disconnect"); $app->start_exit_c(); $app->end_catch(); // Create choice menu $menu_items = array("Welcome"=>"#welcome", "Sport"=>"#sport", "Music"=>"#music", "Quit"=>"#quit" ); $app->load("menu", $menu_items); $app->load("message", array("welcome","Welcome to PHP voice sample application",$_SERVER['PHP_SELF'])); $app->load("message", array("sport","This is the sport section of PHP Voice",$_SERVER['PHP_SELF'])); $app->load("message", array("music","This is the music section of PHP Voice",$_SERVER['PHP_SELF'])); $app->load("message", array("quit","Thank you for your visit, bye.")); $app->end_vxml(); $app->generate(); ?>
Note that this example doesn’t read the digits that users press. It reads what they say instead (i.e. it converts voice to text using Speech Recognition Grammar Specification). Menu item array key represents the expected input value (the expected word or sentence that the user will say) and the array value represents the voice response that corresponds to the expected input. There is a hashtag (#) before the voice response name.
After creating the array, all the messages must be added to VoiceXML code. The above PHP code would generate the following VXML code:
<vxml xmlns="http://www.w3.org/2001/vxml" version="2.0"> <catch event="connection.disconnect"> <exit/> </catch> <menu> <prompt> Say one of: <enumerate/> </prompt> <choice next="#welcome"> Welcome </choice> <choice next="#sport"> Sport </choice> <choice next="#music"> Music </choice> <choice next="#quit"> Quit </choice> </menu> <form id="welcome"> <block> Welcome to PHP voice sample application <goto next="/projects/phpvoice/samples/menu_mod.php"/> </block> </form> <form id="sport"> <block> This is the sport section of PHP Voice <goto next="/projects/phpvoice/samples/menu_mod.php"/> </block> </form> <form id="music"> <block> This is the music section of PHP Voice <goto next="/projects/phpvoice/samples/menu_mod.php"/> </block> </form> <form id="quit"> <block> Thank you for your visit, bye. </block> </form> </vxml>
This article is an introduction to voice recognition in PHP. Its goal was to explain how to create cheap voice applications in a Web environment using PHP and to invite developers to try this approach of using voice recognition technology and open-source voice browser solutions. Now, start using voice recognition and speech synthesis in PHP.
Jason Gilmore is founder of the publishing, training, and consulting firm WJGilmore.com. He is the author of several popular books, including “Easy PHP Websites with the Zend Framework”, “Easy PayPal with PHP”, and “Beginning PHP and MySQL, Fourth Edition”. Follow him on Twitter at @wjgilmore.