The following update is proposed to the Megaco/H.248 requirements document (APC-1695). I am posting it to this group for information. If anyone has any comments on this updated section, please send me an email or speak to me in RedBank next week. Nancy -------------------------------------------------------------------------- Nancy M. Greene Internet & Service Provider Networks, Nortel Networks T:514-271-7221 (internal:ESN853-1077) E:ngreene@nortelnetworks.com
---------- From: Greene, Nancy-M [CAR:5N10:EXCH] Sent: Saturday, October 16, 1999 4:15 PM To: 'megaco@standards.nortelnetworks.com' Subject: reqs-07: updated 11.2.9 Audio Resource Function
Below is proposed revised text for 11.2.9. Audio Resource Function (formerly called IVR Unit). Michael Ramalho and David Cromwell have worked on the wording, and have agreed on requirements for the speaker verification/identification and auditory feature extraction/recognition sections.
11.2.9.1 Play Audio, 11.2.9.2 DTMF Collect, 11.2.9.3 Record Audio, 11.2.9.4 Speech Recognition, 11.2.9.5 Speaker Verification/Identification, and 11.2.9.6 Auditory Feature Extraction/Recognition, 11.2.9.7 Audio Conferencing.
Summary of changes: - the word Audio Server was removed, and replaced by the more general term Audio Enabled Gateway - digit collection was qualified to be DTMF digit collection - In Section 11.2.9.1, parts d and e on sequences and sets reworked to eliminate the implementation specific language - In Section 11.2.9.1 parts f, g, h removed - reordering of sections, putting Audio Conferencing at the end - requirements added to sections 11.2.9.5 and 11.2.9.6
If there are no comments by Tuesday or so, I will issue reqs-08 with this updated section in it. Sometime very soon I expect Tom to issue WG last call on this document.
Thanks to Michael and David for their work on this.
Nancy
revised 11.2.9: 11.2.9. Audio Resource Function
An Audio Resource Function (ARF) consists of one or more functional modules which can be deployed on an stand alone media gateway server IVR, Intelligent Peripheral, speech/speaker recognition unit, etc. or a traditional media gateway. Such a media gateway is known as an Audio Enabled Gateway (AEG) if it performs tasks defined in one or more of the following
ARF functional modules:
Play Audio, DTMF Collect, Record Audio, Speech Recognition, Speaker Verification/Identification, Auditory Feature Extraction/Recognition, or Audio Conferencing.
Additional ARF function modules that support human to machine communications through the use of telephony tones (e.g., DTMF) or auditory means (e.g. speech) may be appended to the AEG definition in future versions of these requirements.
Generic scripting packages for any module must support all the requirements for that module. Any package extension for a given module must include, by inheritance or explicit reference, the requirements for that given module.
The protocol requirements for each of the ARF modules are provided in the following subsections.
11.2.9.1. Play Audio Module
a. Be able to provide the following basic operation:
- request an ARF MG to play an announcement.
b. Be able to specify these play characteristics:
- Play volume
- Play speed
- Play iterations
- Interval between play iterations
- Play duration
c. Permit the specification of voice variables such as DN, number, date,
time, etc. The protocol must allow specification of both the value (eg 234-3456), and well as the type (Directory number).
d. Using the terminology that a segment is a unit of playable speech, or is an abstraction that is resolvable to a unit of playable speech, permit specification of the following segment types:
- A provisioned recording.
- A block of text to be converted to speech.
- A block of text to be displayed on a device.
- A length of silence qualified by duration.
- An algorithmically generated tone.
- A voice variable, specified by type and value. Given a variable type and value, the IVR/ARF unit would dynamically assemble the phrases required for its playback.
- An abstraction that represents a sequence of audio segments. Nesting of these abstractions must also be permitted.
An example of this abstraction is a sequence of audio segments, the first of which is a recording of the words "The number you have dialed," followed by a Directory Number variable, followed by a recording of the words "is no longer in service."
- An abstraction that represents a set of audio segments and which is resolved to a single segment by a qualifier. Nesting of these abstractions must be permitted.
For example take a set of audio segments recorded in different languages all of which express the semantic concept "The number you have dialed is no longer in service." The set is resolved by a language qualifier. If the qualifier is "French," the set resolves to the French version of this announcement.
In the case of a nested abstraction consisting of a set qualified by
language at one level and and a set qualified by gender at another level, it would be possible to specify that an announcement be played in French and spoken by a female voice.
e. Provide two different methods of audio specification:
- Direct specification of the audio components to be played by speci- fying the sequence of segments in the command itself.
- Indirect specification of the audio components to be played by reference to a single identifier that resolves to a provisioned sequence of audio segments.
11.2.9.2. DTMF Collect Module
The DTMF Collect Module must support all of the requirements in the Play Module in addition to the following requirements:
a. Be able to provide the following basic operation:
- request an AEG to play an announcement, which may optionally terminated by DTMF, and then collect DTMF
b. Be able to specify these event collection characteristics:
- The number of attempts to give the user to enter a valid DTMF pat- tern.
c. With respect to digit timers, allow the specification of:
- Time allowed to enter the first digit.
- Time allowed for user to enter each digit subsequent to the first digit.
- Time allowed for user to enter a digit once the maximum expected number of digits has been entered.
d. To be able to allow multiple prompt operations for DTMF digit collection, voice recording (if supported), and/or speech recognition analysis (if supported) provide the following types of prompts:
- Initial Prompt
- Reprompt
- Error prompt
- Failure announcement
- Success announcement.
e. To allow digit pattern matching, allow the specification of:
- maximum number of digits to collect.
- minimum number of digits to collect.
- a digit pattern using a regular expression.
f. To allow digit buffer control, allow the specification of:
- Ability to clear digit buffer prior to playing initial prompt (default is not to clear buffer).
- Default clearing of buffer following playing of un-interruptible announcement segment.
- Default clearing of buffer before playing a re-prompt in response to previous invalid input.
g. Provide a method to specify DTMF interruptibility on a per audio segment basis.
h. Allow the specification of definable key sequences for digit col- lection to:
- Discard collected digits in progress, replay the prompt, and resume DTMF digit collection.
- Discard collected digits in progress and resume DTMF digit collection.
- Terminate the current operation and return the terminating key sequence to the MGC.
i. Provide a way to ask the ARF MG to support the following definable keys for DTMF digit collection and recording. These keys would then be able to be acted upon by the ARF MG:
- A key to terminate playing of an announcement in progress.
- A set of one or more keys that can be accepted as the first digit to be collected.
- A key that signals the end of user input. The key may or may not be returned to the MGC along with the input already collected.
- Keys to stop playing the current announcement and resume playing at the beginning of the first segment of the announcement, last seg- ment of the announcement, previous segment of the announcement, next segment of the announcement, or the current announcement seg- ment.
11.2.9.3. Record Audio Module
The Record Module must support all of the requirements in the Play Module as in addition to the following requirements:
a. Be able to provide the following basic operation:
request an AEG to play an announcement and then record voice.
b. Be able to specify these event collection characteristics:
- The number of attempts to give the user to make a recording.
c. With respect to recording timers, allow the specification of:
- Time to wait for the user to initially speak.
- The amount of silence necessary following the last speech segment for the recording to be considered complete.
- The maximum allowable length of the recording (not including pre- and post-speech silence).
d. To be able to allow multiple prompt operations for DTMF digit collection (if supported) voice recording, speech recognition analysis (if supported) and/or speech verification/identification (if supported) and then to provide the following types of prompts:
- Initial Prompt
- Reprompt
- Error prompt
- Failure announcement
- Success announcement.
e. Allow the specification of definable key sequences for digit recording or speech recognition analysis (if supported) to:
- Discard recording in progress, replay the prompt, and resume recording.
- Discard recording in progress and resume recording.
- Terminate the current operation and return the terminating key sequence to the MGC.
f. Provide a way to ask the ARF MG to support the following definable keys for recording. These keys would then be able to be acted upon by the ARF MG:
- A key to terminate playing of an announcement in progress.
- A key that signals the end of user input. The key may or may not be returned to the MGC along with the input already collected.
- Keys to stop playing the current announcement and resume playing at the beginning of the first segment of the announcement, last seg- ment of the announcement, previous segment of the announcement, next segment of the announcement, or the current announcement seg- ment.
g. While audio prompts are usually provisioned in IVR/ARF MGs, support changing the provisioned prompts in a voice session rather than a data session. In particular, with respect to audio management:
- A method to replace provisioned audio with audio recorded during a call. The newly recorded audio must be accessible using the iden- tifier of the audio it replaces.
- A method to revert from replaced audio to the original provisioned audio.
- A method to take audio recorded during a call and store it such that it is accessible to the current call only through its own newly created unique identifier.
- A method to take audio recorded during a call and store it such that it is accessible to any subsequent call through its own newly created identifier.
11.2.9.4. Speech Recognition Module
The speech recognition module can be used for a number of speech recognition applications, such as:
Limited Vocabulary Isolated Speech Recognition (e.g., "yes", "no", the number "four"),
Limited Vocabulary Continuous Speech Feature Recognition (e.g., the utterace "four hundred twenty-three dollars"),and/or
Continuous Speech Recognition (e.g., unconstrained speech recognition tasks).
The Speech Recognition Module must support all of the requirements in the Play Module as in addition to the following requirements:
a. Be able to provide the following basic operation:
request an AEG to play an announcement and then perform speech recognition analysis.
b. Be able to specify these event collection characteristics:
- The number of attempts to give to perform speech recognition task.
c. With respect to speech recognition analysis timers, allow the specification of:
- Time to wait for the user to initially speak.
- The amount of silence necessary following the last speech segment for the speech recognition analysis segment to be considered complete.
- The maximum allowable length of the speech recognition analysis (not
including pre- and post- speech silence).
d. To be able to allow multiple prompt operations for DTMF digit collection (if supported), voice recording (if supported), and/or speech recognition analysis and then to provide the following types of prompts:
- Initial Prompt
- Reprompt
- Error prompt
- Failure announcement
- Success announcement.
e. Allow the specification of definable key sequences for digit recording (if supported) or speech recognition analysis to:
- Discard in process analysis, replay the prompt, and resume analysis.
- Discard recording in progress and resume analysis.
- Terminate the current operation and return the terminating key sequence to the MGC.
f. Provide a way to ask the ARF MG to support the following definable keys for speech recognition analysis. These keys would then be able to be
acted upon by the ARF MG:
- A key to terminate playing of an announcement in progress.
- A key that signals the end of user input. The key may or may not be returned to the MGC along with the input already collected.
- Keys to stop playing the current announcement and resume playing at the beginning of the first segment of the announcement, last seg- ment of the announcement, previous segment of the announcement, next segment of the announcement, or the current announcement seg- ment.
11.2.9.5. Speaker Verification/Identification Module
The speech verification/identification module returns parameters that indicate either the likelihood of the speaker to be the person that they claim to be (verification task) or the likelihood of the speaker being one of the persons contained in a set of previously characterized speakers (identification task).
The Speaker Verification/Identification Module must support all of the requirements in the Play Module in addition to the following requirements:
a. Be able to download parameters, such as speaker templates (verification
task) or sets of potential speaker templates (identification task), either
prior to the session or in mid-session.
b. Be able to download application specific software to the ARF either prior to the session or in mid-session.
c. Be able to return parameters indicating either the likelihood of the speaker to be the person that they claim to be (verification task) or the likelihood of the speaker being one of the persons contained in a set of previously characterized speakers (identification task).
d. Be able to provide the following basic operation:
request an AEG to play an announcement and then perform speech verification/identification analysis.
e. Be able to specify these event collection characteristics:
- The number of attempts to give to perform speech verification/identification task.
f. With respect to speech verification/identification analysis timers, allow the specification of:
- Time to wait for the user to initially speak.
- The amount of silence necessary following the last speech segment for the speech verification/identification analysis segment to be considered complete.
- The maximum allowable length of the speech verification/identification analysis (not including pre- and post-speech silence).
d. To be able to allow multiple prompt operations for DTMF digit collection (if supported), voice recording, (if supported), speech recognition analysis (if supported) and/or speech verification/identification and provide the following types of prompts:
- Initial Prompt
- Reprompt
- Error prompt
- Failure announcement
- Success announcement.
e. Allow the specification of definable key sequences for digit recording (if supported) or speech recognition (if supported) in the speech verification/identification analysis to:
- Discard speech verification/identification in analysis, replay the prompt, and resume analysis.
- Discard speech verification/identification analysis in progress and resume analysis.
- Terminate the current operation and return the terminating key sequence to the MGC.
f. Provide a way to ask the ARF MG to support the following definable keys for speech verification/identification analysis. These keys would then be able to be acted upon by the ARF MG:
- A key to terminate playing of an announcement in progress.
- A key that signals the end of user input. The key may or may not be returned to the MGC along with the input already collected.
- Keys to stop playing the current announcement and resume speech verification/identification at the beginning of the first segment of the announcement, last seg- ment of the announcement, previous segment of the announcement, next segment of the announcement, or the current announcement seg- ment.
11.2.9.6. Auditory Feature Extraction/Recognition Module
The auditory feature extraction/recognition module is engineered to continuously monitor the auditory stream for the appearance of particular auditory signals or speech utterances of interest and to report these events (and optionally a
signal feature representation of these events) to network servers or MGCs.
The Auditory Feature Extraction/Recognition Module must support the following requirements:
a. Be able to download application specific software to the ARF either prior to the session or in mid-session.
b. Be able to download parameters, such as a representation of the auditory feature to extract/recognize, for prior to the session or in mid-session.
c. Be able to return parameters indicating the auditory event found or a representation of the feature found (i.e., auditory feature).
11.2.9.7. Audio Conferencing Module
The protocol must support:
a. a mechanism to create multi-point conferences of audio only and multimedia conferences in the MG.
b. audio mixing; mixing multiple audio streams into a new composite audio stream
c. audio switching; selection of incoming audio stream to be sent out to all conference participants. -------------------------------------------------------------------------- Nancy M. Greene Internet & Service Provider Networks, Nortel Networks T:514-271-7221 (internal:ESN853-1077) E:ngreene@nortelnetworks.com