The following update is proposed to the Megaco/H.248 requirements document
(APC-1695). I am posting it to this group for information.
If anyone has any comments on this updated section, please send me an email
or speak to me in RedBank next week.
Nancy
--------------------------------------------------------------------------
Nancy M. Greene
Internet & Service Provider Networks, Nortel Networks
T:514-271-7221 (internal:ESN853-1077) E:ngreene@nortelnetworks.com
> ----------
> From: Greene, Nancy-M [CAR:5N10:EXCH]
> Sent: Saturday, October 16, 1999 4:15 PM
> To: 'megaco(a)standards.nortelnetworks.com'
> Subject: reqs-07: updated 11.2.9 Audio Resource Function
>
> Below is proposed revised text for 11.2.9. Audio Resource Function
> (formerly called IVR Unit). Michael Ramalho and David Cromwell have worked
> on the wording, and have agreed on requirements for the speaker
> verification/identification and auditory feature extraction/recognition
> sections.
>
> 11.2.9.1 Play Audio,
> 11.2.9.2 DTMF Collect,
> 11.2.9.3 Record Audio,
> 11.2.9.4 Speech Recognition,
> 11.2.9.5 Speaker Verification/Identification, and
> 11.2.9.6 Auditory Feature Extraction/Recognition,
> 11.2.9.7 Audio Conferencing.
>
> Summary of changes:
> - the word Audio Server was removed, and replaced by the more general term
> Audio Enabled Gateway
> - digit collection was qualified to be DTMF digit collection
> - In Section 11.2.9.1, parts d and e on sequences and sets reworked to
> eliminate the implementation specific language
> - In Section 11.2.9.1 parts f, g, h removed
> - reordering of sections, putting Audio Conferencing at the end
> - requirements added to sections 11.2.9.5 and 11.2.9.6
>
> If there are no comments by Tuesday or so, I will issue reqs-08 with this
> updated section in it. Sometime very soon I expect Tom to issue WG last
> call on this document.
>
> Thanks to Michael and David for their work on this.
>
> Nancy
>
> revised 11.2.9:
> 11.2.9. Audio Resource Function
>
> An Audio Resource Function (ARF) consists of one or more functional
> modules which can be deployed on an stand alone media gateway server IVR,
> Intelligent Peripheral, speech/speaker recognition unit, etc. or a
> traditional media gateway. Such a media gateway is known as an Audio
> Enabled
> Gateway (AEG) if it performs tasks defined in one or more of the following
>
> ARF functional modules:
>
> Play Audio,
> DTMF Collect,
> Record Audio,
> Speech Recognition,
> Speaker Verification/Identification,
> Auditory Feature Extraction/Recognition, or
> Audio Conferencing.
>
>
>
> Additional ARF function modules that support human to machine
> communications
> through the use of telephony tones (e.g., DTMF) or auditory means (e.g.
> speech)
> may be appended to the AEG definition in future versions of these
> requirements.
>
> Generic scripting packages for any module must support all the
> requirements for
> that module. Any package extension for a given module must include, by
> inheritance or explicit reference, the requirements for that given module.
>
> The protocol requirements for each of the ARF modules are provided in the
> following subsections.
>
>
> 11.2.9.1. Play Audio Module
>
> a. Be able to provide the following basic operation:
>
> - request an ARF MG to play an announcement.
>
> b. Be able to specify these play characteristics:
>
> - Play volume
>
> - Play speed
>
> - Play iterations
>
> - Interval between play iterations
>
> - Play duration
>
> c. Permit the specification of voice variables such as DN, number, date,
>
> time, etc. The protocol must allow specification of both the value
> (eg 234-3456), and well as the type (Directory number).
>
> d. Using the terminology that a segment is a unit of playable speech, or
> is
> an abstraction that is resolvable to a unit of playable speech,
> permit
> specification of the following segment types:
>
> - A provisioned recording.
>
> - A block of text to be converted to speech.
>
> - A block of text to be displayed on a device.
>
> - A length of silence qualified by duration.
>
> - An algorithmically generated tone.
>
> - A voice variable, specified by type and value. Given a variable
> type and value, the IVR/ARF unit would dynamically assemble the
> phrases required for its playback.
>
> - An abstraction that represents a sequence of audio segments. Nesting
> of
> these abstractions must also be permitted.
>
> An example of this abstraction is a sequence of audio segments, the
> first
> of which is a recording of the words "The number you have dialed,"
> followed by a Directory Number variable, followed by a recording of
> the
> words "is no longer in service."
>
> - An abstraction that represents a set of audio segments and which is
> resolved to a single segment by a qualifier. Nesting of these
> abstractions must be permitted.
>
> For example take a set of audio segments recorded in different
> languages
> all of which express the semantic concept "The number you have dialed
> is
> no longer in service." The set is resolved by a language qualifier.
> If
> the qualifier is "French," the set resolves to the French version of
> this
> announcement.
>
> In the case of a nested abstraction consisting of a set qualified by
>
> language at one level and and a set qualified by gender at another
> level,
> it would be possible to specify that an announcement be played in
> French
> and spoken by a female voice.
>
> e. Provide two different methods of audio specification:
>
> - Direct specification of the audio components to be played by speci-
> fying the sequence of segments in the command itself.
>
> - Indirect specification of the audio components to be played by
> reference to a single identifier that resolves to a provisioned
> sequence of audio segments.
>
>
> 11.2.9.2. DTMF Collect Module
>
> The DTMF Collect Module must support all of the requirements in the Play
> Module in addition to the following requirements:
>
> a. Be able to provide the following basic operation:
>
>
> - request an AEG to play an announcement, which may optionally
> terminated
> by DTMF, and then collect DTMF
>
> b. Be able to specify these event collection characteristics:
>
> - The number of attempts to give the user to enter a valid DTMF pat-
> tern.
>
> c. With respect to digit timers, allow the specification of:
>
> - Time allowed to enter the first digit.
>
> - Time allowed for user to enter each digit subsequent to the first
> digit.
>
> - Time allowed for user to enter a digit once the maximum expected
> number of digits has been entered.
>
> d. To be able to allow multiple prompt operations for DTMF digit
> collection,
> voice recording (if supported), and/or speech recognition analysis
> (if
> supported) provide the
> following types of prompts:
>
> - Initial Prompt
>
> - Reprompt
>
> - Error prompt
>
> - Failure announcement
>
> - Success announcement.
>
> e. To allow digit pattern matching, allow the specification of:
>
> - maximum number of digits to collect.
>
> - minimum number of digits to collect.
>
> - a digit pattern using a regular expression.
>
> f. To allow digit buffer control, allow the specification of:
>
> - Ability to clear digit buffer prior to playing initial prompt
> (default is not to clear buffer).
>
> - Default clearing of buffer following playing of un-interruptible
> announcement segment.
>
> - Default clearing of buffer before playing a re-prompt in response
> to previous invalid input.
>
> g. Provide a method to specify DTMF interruptibility on a per audio
> segment basis.
>
> h. Allow the specification of definable key sequences for digit col-
> lection to:
>
> - Discard collected digits in progress, replay the prompt, and resume
> DTMF digit collection.
>
> - Discard collected digits in progress and resume DTMF digit
> collection.
>
> - Terminate the current operation and return the terminating key
> sequence to the MGC.
>
> i. Provide a way to ask the ARF MG to support the following definable
> keys for DTMF digit collection and recording. These keys would then
> be
> able to be acted upon by the ARF MG:
>
> - A key to terminate playing of an announcement in progress.
>
> - A set of one or more keys that can be accepted as the first digit
> to be collected.
>
> - A key that signals the end of user input. The key may or may not
> be returned to the MGC along with the input already collected.
>
> - Keys to stop playing the current announcement and resume playing at
> the beginning of the first segment of the announcement, last seg-
> ment of the announcement, previous segment of the announcement,
> next segment of the announcement, or the current announcement seg-
> ment.
>
>
> 11.2.9.3. Record Audio Module
>
> The Record Module must support all of the requirements in the Play
> Module as in addition to the following requirements:
>
> a. Be able to provide the following basic operation:
>
> request an AEG to play an announcement and then record voice.
>
> b. Be able to specify these event collection characteristics:
>
> - The number of attempts to give the user to make a recording.
>
> c. With respect to recording timers, allow the specification of:
>
> - Time to wait for the user to initially speak.
>
> - The amount of silence necessary following the last speech segment
> for the recording to be considered complete.
>
> - The maximum allowable length of the recording (not including pre-
> and post-speech silence).
>
> d. To be able to allow multiple prompt operations for DTMF digit
> collection
> (if supported) voice recording, speech recognition analysis (if
> supported)
> and/or speech verification/identification (if supported) and then to
> provide the
> following types of prompts:
>
> - Initial Prompt
>
> - Reprompt
>
> - Error prompt
>
> - Failure announcement
>
> - Success announcement.
>
> e. Allow the specification of definable key sequences for digit
> recording or speech recognition analysis (if supported) to:
>
> - Discard recording in progress, replay the prompt, and resume
> recording.
>
> - Discard recording in progress and resume recording.
>
> - Terminate the current operation and return the terminating key
> sequence to the MGC.
>
> f. Provide a way to ask the ARF MG to support the following definable
> keys for recording. These keys would then be able to be acted upon
> by the ARF MG:
>
> - A key to terminate playing of an announcement in progress.
>
> - A key that signals the end of user input. The key may or may not
> be returned to the MGC along with the input already collected.
>
> - Keys to stop playing the current announcement and resume playing at
> the beginning of the first segment of the announcement, last seg-
> ment of the announcement, previous segment of the announcement,
> next segment of the announcement, or the current announcement seg-
> ment.
>
> g. While audio prompts are usually provisioned in IVR/ARF MGs, support
> changing the provisioned prompts in a voice session rather than a
> data session. In particular, with respect to audio management:
>
> - A method to replace provisioned audio with audio recorded during a
> call. The newly recorded audio must be accessible using the iden-
> tifier of the audio it replaces.
>
> - A method to revert from replaced audio to the original provisioned
> audio.
>
> - A method to take audio recorded during a call and store it such
> that it is accessible to the current call only through its own
> newly created unique identifier.
>
> - A method to take audio recorded during a call and store it such
> that it is accessible to any subsequent call through its own newly
> created identifier.
>
>
>
> 11.2.9.4. Speech Recognition Module
>
> The speech recognition module can be used for a number of speech
> recognition
> applications, such as:
>
> Limited Vocabulary Isolated Speech Recognition (e.g., "yes", "no",
> the number "four"),
>
> Limited Vocabulary Continuous Speech Feature Recognition (e.g., the
> utterace "four hundred twenty-three dollars"),and/or
>
> Continuous Speech Recognition (e.g., unconstrained speech
> recognition tasks).
>
> The Speech Recognition Module must support all of the requirements in the
> Play
> Module as in addition to the following requirements:
>
> a. Be able to provide the following basic operation:
>
> request an AEG to play an announcement and then perform speech
> recognition
> analysis.
>
> b. Be able to specify these event collection characteristics:
>
> - The number of attempts to give to perform speech recognition task.
>
> c. With respect to speech recognition analysis timers, allow the
> specification
> of:
>
> - Time to wait for the user to initially speak.
>
> - The amount of silence necessary following the last speech segment
> for the speech recognition analysis segment to be considered
> complete.
>
> - The maximum allowable length of the speech recognition analysis (not
>
> including pre- and post- speech silence).
>
> d. To be able to allow multiple prompt operations for DTMF digit
> collection
> (if supported), voice recording (if supported), and/or speech
> recognition
> analysis and then to provide the following types of prompts:
>
> - Initial Prompt
>
> - Reprompt
>
> - Error prompt
>
> - Failure announcement
>
> - Success announcement.
>
> e. Allow the specification of definable key sequences for digit
> recording (if supported) or speech recognition analysis to:
>
> - Discard in process analysis, replay the prompt, and resume
> analysis.
>
> - Discard recording in progress and resume analysis.
>
> - Terminate the current operation and return the terminating key
> sequence to the MGC.
>
> f. Provide a way to ask the ARF MG to support the following definable
> keys for speech recognition analysis. These keys would then be able to be
>
> acted upon by the ARF MG:
>
> - A key to terminate playing of an announcement in progress.
>
> - A key that signals the end of user input. The key may or may not
> be returned to the MGC along with the input already collected.
>
> - Keys to stop playing the current announcement and resume playing at
> the beginning of the first segment of the announcement, last seg-
> ment of the announcement, previous segment of the announcement,
> next segment of the announcement, or the current announcement seg-
> ment.
>
>
> 11.2.9.5. Speaker Verification/Identification Module
>
> The speech verification/identification module returns parameters that
> indicate
> either the likelihood of the speaker to be the person that they claim to
> be
> (verification task) or the likelihood of the speaker being one of the
> persons
> contained in a set of previously characterized speakers (identification
> task).
>
> The Speaker Verification/Identification Module must support all of the
> requirements in the Play Module in addition to the following requirements:
>
> a. Be able to download parameters, such as speaker templates (verification
>
> task) or sets of potential speaker templates (identification task), either
>
> prior to the session or in mid-session.
>
> b. Be able to download application specific software to the ARF either
> prior
> to the session or in mid-session.
>
> c. Be able to return parameters indicating either the likelihood of the
> speaker to be the person that they claim to be (verification task) or the
> likelihood of the speaker being one of the persons contained in a set of
> previously characterized speakers (identification task).
>
> d. Be able to provide the following basic operation:
>
> request an AEG to play an announcement and then perform speech
> verification/identification analysis.
>
> e. Be able to specify these event collection characteristics:
>
> - The number of attempts to give to perform speech
> verification/identification task.
>
>
> f. With respect to speech verification/identification analysis timers,
> allow
> the specification of:
>
> - Time to wait for the user to initially speak.
>
> - The amount of silence necessary following the last speech segment
> for the speech verification/identification analysis segment to be
> considered complete.
>
> - The maximum allowable length of the speech
> verification/identification
> analysis (not including pre- and post-speech silence).
>
> d. To be able to allow multiple prompt operations for DTMF digit
> collection
> (if supported), voice recording, (if supported), speech recognition
> analysis (if supported) and/or speech verification/identification and
> provide
> the following types of prompts:
>
> - Initial Prompt
>
> - Reprompt
>
> - Error prompt
>
> - Failure announcement
>
> - Success announcement.
>
> e. Allow the specification of definable key sequences for digit
> recording (if supported) or speech recognition (if supported) in the
> speech
> verification/identification analysis to:
>
> - Discard speech verification/identification in analysis, replay the
> prompt,
> and resume analysis.
>
> - Discard speech verification/identification analysis in progress and
> resume
> analysis.
>
> - Terminate the current operation and return the terminating key
> sequence to the MGC.
>
> f. Provide a way to ask the ARF MG to support the following definable
> keys for speech verification/identification analysis. These keys would
> then
> be able to be acted upon by the ARF MG:
>
> - A key to terminate playing of an announcement in progress.
>
> - A key that signals the end of user input. The key may or may not
> be returned to the MGC along with the input already collected.
>
> - Keys to stop playing the current announcement and resume speech
> verification/identification at
> the beginning of the first segment of the announcement, last seg-
> ment of the announcement, previous segment of the announcement,
> next segment of the announcement, or the current announcement seg-
> ment.
>
>
> 11.2.9.6. Auditory Feature Extraction/Recognition Module
>
> The auditory feature extraction/recognition module is engineered to
> continuously
> monitor the auditory stream for the appearance of particular auditory
> signals or
> speech utterances of interest and to report these events (and optionally a
>
> signal feature representation of these events) to network servers or MGCs.
>
> The Auditory Feature Extraction/Recognition Module must support the
> following
> requirements:
>
> a. Be able to download application specific software to the ARF either
> prior
> to the session or in mid-session.
>
> b. Be able to download parameters, such as a representation of the
> auditory
> feature to extract/recognize, for prior to the session or in mid-session.
>
> c. Be able to return parameters indicating the auditory event found or a
> representation of the feature found (i.e., auditory feature).
>
>
> 11.2.9.7. Audio Conferencing Module
>
> The protocol must support:
>
> a. a mechanism to create multi-point conferences of audio only and
> multimedia conferences in the MG.
>
> b. audio mixing; mixing multiple audio streams into a new composite
> audio stream
>
> c. audio switching; selection of incoming audio stream to be sent out
> to all conference participants.
> --------------------------------------------------------------------------
> Nancy M. Greene
> Internet & Service Provider Networks, Nortel Networks
> T:514-271-7221 (internal:ESN853-1077) E:ngreene@nortelnetworks.com
>