The SALT (Speech Application Language Tags) specification pairs speech with HTML, XHTML, and XML. The specification boasts a 70-member consortium of backers including industry heavyweights Microsoft, Cisco, and Intel. The group submitted SALT 1.0 to the W3C. SALT is not a W3C standard, though. SALT was submitted to the W3C’s Voice Browser working group, which will consider it as a part of the next VoiceXML standard. This article will touch on the SALT/VoiceXML relationship again near the end.
Because SALT is implemented at the HTML, XHTML, and XML level, it shields developers from some of the complexity of working with individual vendor speech APIs. Both users and developers benefit from SALT’s ability to add speech capability for a multitude of devices—traditional PCs as well as telephones, PDAs, and other devices.
SALT actually goes beyond just the “speech” part of its name, though. The SALT specification includes “multi-modal” access, which means the user can access the application through a variety of input methods. Speech is the obvious one, but SALT also provides support for telephone keypads and for mixed mode input where a user can switch between typing or pointing for input and providing input via speech. For output, a SALT application can provide multiple output types as well including traditional text and graphics, synthesized speech, and audio. The application should detect the proper output type for the user client device and respond accordingly.
Processing SALT applications is done differently depending on the type of client platform. For example, when a user accesses a SALT-enabled Web page from a PC, the PC will provide the processing horsepower to take the speech input and turn it into input for the server. Likewise, the PC will render any text-to-speech audio output. On the other hand, SALT access from a processorless device such as a telephone depends on a SALT-enabled server for the user to call into. In that scenario, the server processes the user speech into commands and the audio output is rendered server-side as well. With Microsoft backing, SALT is of course a part of the .NET Framework for Microsoft developers. Microsoft has a speech SDK (it is still in beta at this time) for .NET developers, which uses SALT. Developers should be familiar with ASP.NET before attempting to work with SALT.
In a sense, SALT is a competitor to the VoiceXML standard for speech applications. However, SALT does incorporate some VoiceXML and the related W3C standards SRGS (Speech Recognition Grammar Specification) and SSML (Speech Synthesis Markup Language), so the two are not completely separate items.
SALT will probably appeal more to Microsoft developers and other developers coming in to speech from a Web development background. Experienced Web developers will understand the SALT development model because it uses the event-model most Web applications are built on.
VoiceXML could be more appealing to developers with a background in traditional telephony or IVR (Interactive Voice Response) applications. VoiceXML could also be more appealing where the application has a very strict flow definition, like the ones you would find traditionally serviced well by an IVR application. VoiceXML is a larger standard because it is a complete standalone markup specification where SALT depends more on existing functionality handled by other Web application specifications. VoiceXML also has the advantage of being a more widely supported standard, with more than 250 companies involved in the VoiceXML Forum. VoiceXML is a more mature standard as well, now in a ratified version 2.0 with the original specification dating back five years to 1999. However, SALT supporters point out that VoiceXML’s maturity can also be a SALT advantage. VoiceXML has roots in an earlier Web era where SALT is based in more modern Web development architecture.
For further reading, Hitesh Seth has a series of five SALT articles on Developer.com, starting with an introduction SALT: By Example, which quickly dives into some SALT code.
The Speech Application Language Tags (SALT) 1.0 specification enables multimodal and telephony-enabled access to information, applications, and Web services from PCs, telephones, tablet PCs, and wireless personal digital assistants (PDAs). The Speech Application Language Tags extend existing mark-up languages such as HTML, XHTML, and XML. Multimodal access will enable users to interact with an application in a variety of ways: they will be able to input data using speech, a keyboard, keypad, mouse and/or stylus, and produce data as synthesized speech, audio, plain text, motion video, and/or graphics. Each of these modes will be able to be used independently or concurrently.
The SALT Forum brings together a diverse group of companies sharing a common interest in developing and promoting speech technologies for multimodal and telephony applications. Founded in 2001 and representing over 70 technology leaders, the SALT Forum seeks to establish and promote a royalty-free standard that provides spoken access to many forms of content through a wide variety of devices.
In pursuit of these goals, Version 1.0 of the SALT specification, was developed by Forum members and contributed to the World Wide Web Consortium (W3C). Membership in the SALT Forum is open to all.
The Founders about SALT:
“The ability to add speech to graphical applications will contribute to further growth in the rapidly expanding VoIP market and will enable delivery of innovative and interoperable value-added communications and services over a single packet network.”
— Alistair Woodman, Director of Marketing, Voice Technology Center
“Multimodal user experience is the natural way of interaction for end users. The ability of end users to choose their preferred mode of interaction at any point during a transaction will drive the take rate and the usage of new voice and data services. Comverse is committed to evolving its platforms to enable the development of multimodal revenue generating applications and services for 2.5 and 3G networks, including our messaging, personal communications and entertainment products along with the Spark Alliance Program. An open standard supported by major industry leaders will significantly increase the penetration and delivery of enhanced voice and data communications services.”
— Zeev Bregman. CEO
“Intel endorses the delivery of open, standards-based speech tags to foster another wave of innovation on the Web and within communications networks. The addition of speech to text and graphics as a human interface increases computing device ease of use and benefits business and consumers. Call control expertise will enable developers to offer basic and enhanced calling features within their applications. Intel will deliver industry-leading performance for applications using Speech Application Language Tags when run on Intel platform architectures designed for desktop, mobile and tablet PCs, cell phones, PDAs and other wireless devices as well as a range of communication and Web servers.”
— Howard Bubb, Vice President and General Manager, Telecommunications and Embedded Group
“As part of our commitment to provide leadership in enabling speech access to Web applications and services, Microsoft will offer Web developer tools to implement SALT, including providing SALT extensions for Visual Studio® .NET and ASP.NET technologies, as well as extensions for Microsoft Internet Explorer and Pocket Internet Explorer Web browser software. We’re delighted to join with these major industry players to create a critical mass behind an open, independent infrastructure that will make natural-speech access to applications and the Web a reality.”
— Kai-Fu Lee, Vice President, Natural Interactive Services Division
“Philips technology encompasses all that SALT will unite: a convenient speech user interface for all kinds of devices providing input along with feedback via speech or on display. Thus SALT will make it even easier to use applications with Philips technology inside. Now it will be possible for you to ask your car navigation system for directions, as well as tell it to display the Web page with the hotels along the way from the Yellow Pages — all with just your voice.”
— Frank Caris, President
Philips Speech Processing North America
“Scansoft has invested in multimodal technologies, through both its networked and embedded speech-recognition and text-to-speech, and sees this founding initiative as critical to the success of multimodal applications in our industry. We have seen strong early-stage demand for multimodal applications and believe a standards initiative is key for widespread adoption.”
— Steve Chambers, general manager
For further reference see: