FW: reqs-07: updated 11.2.9 Audio Resource Function

16 Oct 1999

      The following update is proposed to the Megaco/H.248 requirements document
(APC-1695). I am posting it to this group for information.

If anyone has any comments on this updated section, please send me an email
or speak to me in RedBank next week.

Nancy
--------------------------------------------------------------------------
Nancy M. Greene
Internet & Service Provider Networks, Nortel Networks
T:514-271-7221 (internal:ESN853-1077) E:ngreene@nortelnetworks.com
...
----------
From:         Greene, Nancy-M [CAR:5N10:EXCH]
Sent:         Saturday, October 16, 1999 4:15 PM
To:   'megaco@standards.nortelnetworks.com'
Subject:      reqs-07: updated 11.2.9 Audio Resource Function
Below is proposed revised text for 11.2.9.  Audio Resource Function
(formerly called IVR Unit). Michael Ramalho and David Cromwell have worked
on the wording, and have agreed on requirements for the speaker
verification/identification and auditory feature extraction/recognition
sections.
11.2.9.1 Play Audio,
11.2.9.2 DTMF Collect,
11.2.9.3 Record Audio,
11.2.9.4 Speech Recognition,
11.2.9.5 Speaker Verification/Identification, and
11.2.9.6 Auditory Feature Extraction/Recognition,
11.2.9.7 Audio Conferencing.
Summary of changes:
- the word Audio Server was removed, and replaced by the more general term
Audio Enabled Gateway
- digit collection was qualified to be DTMF digit collection
- In Section 11.2.9.1, parts d and e on sequences and sets reworked to
eliminate the implementation specific language
- In Section 11.2.9.1 parts f, g, h removed
- reordering of sections, putting Audio Conferencing at the end
- requirements added to sections 11.2.9.5 and 11.2.9.6
If there are no comments by Tuesday or so, I will issue reqs-08 with this
updated section in it. Sometime very soon I expect Tom to issue WG last
call on this document.
Thanks to Michael and David for their work on this.
Nancy
revised 11.2.9:
11.2.9.  Audio Resource Function
An Audio Resource Function (ARF) consists of one or more functional
modules which can be deployed on an stand alone media gateway server IVR,
Intelligent Peripheral, speech/speaker recognition unit, etc. or a
traditional media gateway.  Such a media gateway is known as an Audio
Enabled
Gateway (AEG) if it performs tasks defined in one or more of the following
ARF functional modules:
Play Audio,
           DTMF Collect,
           Record Audio,
           Speech Recognition,
           Speaker Verification/Identification,
           Auditory Feature Extraction/Recognition, or
           Audio Conferencing.
Additional ARF function modules that support human to machine
communications
through the use of telephony tones (e.g., DTMF) or auditory means (e.g.
speech)
may be appended to the AEG definition in future versions of these
requirements.
Generic scripting packages for any module must support all the
requirements for
that module. Any package extension for a given module must include, by
inheritance or explicit reference, the requirements for that given module.
The protocol requirements for each of the ARF modules are provided in the
following subsections.
11.2.9.1.  Play Audio Module
a.   Be able to provide the following basic operation:
-    request an ARF MG to play an announcement.
b.   Be able to specify these play characteristics:
-    Play volume
-    Play speed
-    Play iterations
-    Interval between play iterations
-    Play duration
c.   Permit the specification of voice variables such as DN, number, date,
time, etc.  The protocol must allow specification of both the value
     (eg 234-3456), and well as the type (Directory number).
d.   Using the terminology that a segment is a unit of playable speech, or
is
     an abstraction that is resolvable to a unit of playable speech,
permit
     specification of the following segment types:
-    A provisioned recording.
-    A block of text to be converted to speech.
-    A block of text to be displayed on a device.
-    A length of silence qualified by duration.
-    An algorithmically generated tone.
-    A voice variable, specified by type and value.  Given a variable
     type and value, the IVR/ARF unit would dynamically assemble the
     phrases required for its playback.
-    An abstraction that represents a sequence of audio segments.  Nesting
of
     these abstractions must also be permitted.
An example of this abstraction is a sequence of audio segments, the
first
     of which is a recording of the words "The number you have dialed,"
     followed by a Directory Number variable, followed by a recording of
the
     words "is no longer in service."
-    An abstraction that represents a set of audio segments and which is
     resolved to a single segment by a qualifier.  Nesting of these
     abstractions must be permitted.
For example take a set of audio segments recorded in different
languages
     all of which express the semantic concept "The number you have dialed
is
     no longer in service."  The set is resolved by a language qualifier.
If
     the qualifier is "French," the set resolves to the French version of
this
     announcement.
In the case of a nested abstraction consisting of a set qualified by
language at one level and and a set qualified by gender at another
level,
     it would be possible to specify that an announcement be played in
French
     and spoken by a female voice.
e.   Provide two different methods of audio specification:
-    Direct specification of the audio components to be played by speci-
     fying the sequence of segments in the command itself.
-    Indirect specification of the audio components to be played by
     reference to a single identifier that resolves to a provisioned
     sequence of audio segments.
11.2.9.2.  DTMF Collect Module
The DTMF Collect Module must support all of the requirements in the Play
Module in addition to the following requirements:
a.   Be able to provide the following basic operation:
-    request an AEG to play an announcement, which may optionally
terminated
     by DTMF, and then collect DTMF
b.   Be able to specify these event collection characteristics:
-    The number of attempts to give the user to enter a valid DTMF pat-
     tern.
c.   With respect to digit timers, allow the specification of:
-    Time allowed to enter the first digit.
-    Time allowed for user to enter each digit subsequent to the first
     digit.
-    Time allowed for user to enter a digit once the maximum expected
     number of digits has been entered.
d.   To be able to allow multiple prompt operations for DTMF digit
collection,
     voice recording (if supported), and/or speech recognition analysis
(if
supported) provide the
     following types of prompts:
-    Initial Prompt
-    Reprompt
-    Error prompt
-    Failure announcement
-    Success announcement.
e.   To allow digit pattern matching, allow the specification of:
-    maximum number of digits to collect.
-    minimum number of digits to collect.
-    a digit pattern using a regular expression.
f.   To allow digit buffer control, allow the specification of:
-    Ability to clear digit buffer prior to playing initial prompt
     (default is not to clear buffer).
-    Default clearing of buffer following playing of un-interruptible
     announcement segment.
-    Default clearing of buffer before playing a re-prompt in response
     to previous invalid input.
g.   Provide a method to specify DTMF interruptibility on a per audio
     segment basis.
h.   Allow the specification of definable key sequences for digit col-
     lection to:
-    Discard collected digits in progress, replay the prompt, and resume
     DTMF digit collection.
-    Discard collected digits in progress and resume DTMF digit
collection.
-    Terminate the current operation and return the terminating key
     sequence to the MGC.
i.   Provide a way to ask the ARF MG to support the following definable
     keys for DTMF digit collection and recording. These keys would then
be
     able to be acted upon by the ARF MG:
-    A key to terminate playing of an announcement in progress.
-    A set of one or more keys that can be accepted as the first digit
     to be collected.
-    A key that signals the end of user input.  The key may or may not
     be returned to the MGC along with the input already collected.
-    Keys to stop playing the current announcement and resume playing at
     the beginning of the first segment of the announcement, last seg-
     ment of the announcement, previous segment of the announcement,
     next segment of the announcement, or the current announcement seg-
     ment.
11.2.9.3.  Record Audio Module
The Record Module must support all of the requirements in the Play
Module as in addition to the following requirements:
a.   Be able to provide the following basic operation:
request an AEG to play an announcement and then record voice.
b.   Be able to specify these event collection characteristics:
-    The number of attempts to give the user to make a recording.
c.   With respect to recording timers, allow the specification of:
-    Time to wait for the user to initially speak.
-    The amount of silence necessary following the last speech segment
     for the recording to be considered complete.
-    The maximum allowable length of the recording  (not including pre-
     and post-speech silence).
d.   To be able to allow multiple prompt operations for DTMF digit
collection
     (if supported) voice recording, speech recognition analysis (if
supported)
and/or speech verification/identification (if supported) and then to
provide the
     following types of prompts:
-    Initial Prompt
-    Reprompt
-    Error prompt
-    Failure announcement
-    Success announcement.
e.   Allow the specification of definable key sequences for digit
     recording or speech recognition analysis (if supported) to:
-    Discard recording in progress, replay the prompt, and resume
     recording.
-    Discard recording in progress and resume recording.
-    Terminate the current operation and return the terminating key
     sequence to the MGC.
f.   Provide a way to ask the ARF MG to support the following definable
     keys for recording. These keys would then be able to be acted upon
     by the ARF MG:
-    A key to terminate playing of an announcement in progress.
-    A key that signals the end of user input.  The key may or may not
     be returned to the MGC along with the input already collected.
-    Keys to stop playing the current announcement and resume playing at
     the beginning of the first segment of the announcement, last seg-
     ment of the announcement, previous segment of the announcement,
     next segment of the announcement, or the current announcement seg-
     ment.
g.   While audio prompts are usually provisioned in IVR/ARF MGs, support
     changing the provisioned prompts in a voice session rather than a
     data session.  In particular, with respect to audio management:
-    A method to replace provisioned audio with audio recorded during a
     call. The newly recorded audio must be accessible using the iden-
     tifier of the audio it replaces.
-    A method to revert from replaced audio to the original provisioned
     audio.
-    A method to take audio recorded during a call and store it such
     that it is accessible to the current call only through its own
     newly created unique identifier.
-    A method to take audio recorded during a call and store it such
     that it is accessible to any subsequent call through its own newly
     created identifier.
11.2.9.4.  Speech Recognition Module
The speech recognition module can be used for a number of speech
recognition
applications, such as:
Limited Vocabulary Isolated Speech Recognition (e.g., "yes", "no",
the number "four"),
Limited Vocabulary Continuous Speech Feature Recognition (e.g., the
utterace "four hundred twenty-three dollars"),and/or
Continuous Speech Recognition (e.g., unconstrained speech
recognition tasks).
The Speech Recognition Module must support all of the requirements in the
Play
Module as in addition to the following requirements:
a.   Be able to provide the following basic operation:
request an AEG to play an announcement and then perform speech
recognition
analysis.
b.   Be able to specify these event collection characteristics:
-    The number of attempts to give to perform speech recognition task.
c. With respect to speech recognition analysis timers, allow the
specification
    of:
-    Time to wait for the user to initially speak.
-    The amount of silence necessary following the last speech segment
     for the speech recognition analysis segment to be considered
complete.
-    The maximum allowable length of the speech recognition analysis  (not
including pre- and post- speech silence).
d.   To be able to allow multiple prompt operations for DTMF digit
collection
     (if supported), voice recording (if supported), and/or speech
recognition
analysis and then to provide the following types of prompts:
-    Initial Prompt
-    Reprompt
-    Error prompt
-    Failure announcement
-    Success announcement.
e.   Allow the specification of definable key sequences for digit
     recording (if supported) or speech recognition analysis to:
-    Discard  in process analysis, replay the prompt, and resume
     analysis.
-    Discard recording in progress and resume analysis.
-    Terminate the current operation and return the terminating key
     sequence to the MGC.
f.   Provide a way to ask the ARF MG to support the following definable
keys for speech recognition analysis. These keys would then be able to be
acted upon by the ARF MG:
-    A key to terminate playing of an announcement in progress.
-    A key that signals the end of user input.  The key may or may not
     be returned to the MGC along with the input already collected.
-    Keys to stop playing the current announcement and resume playing at
     the beginning of the first segment of the announcement, last seg-
     ment of the announcement, previous segment of the announcement,
     next segment of the announcement, or the current announcement seg-
     ment.
11.2.9.5.  Speaker Verification/Identification Module
The speech verification/identification module returns parameters that
indicate
either the likelihood of the speaker to be the person that they claim to
be
(verification task) or the likelihood of the speaker being one of the
persons
contained in a set of previously characterized speakers (identification
task).
The Speaker Verification/Identification Module must support all of the
requirements in the Play Module in addition to the following requirements:
a. Be able to download parameters, such as speaker templates (verification
task) or sets of potential speaker templates (identification task), either
prior to the session or in mid-session.
b. Be able to download application specific software to the ARF either
prior
to the session or in mid-session.
c. Be able to return parameters indicating either the likelihood of the
speaker to be the person that they claim to be (verification task) or the
likelihood of the speaker being one of the persons contained in a set of
previously characterized speakers (identification task).
d.   Be able to provide the following basic operation:
request an AEG to play an announcement and then perform speech
verification/identification analysis.
e.   Be able to specify these event collection characteristics:
- The number of attempts to give to perform speech
verification/identification task.
f.  With respect to speech verification/identification analysis timers,
allow
the specification of:
-    Time to wait for the user to initially speak.
-    The amount of silence necessary following the last speech segment
     for the speech verification/identification analysis segment to be
considered complete.
-    The maximum allowable length of the speech
verification/identification
analysis  (not including pre- and post-speech silence).
d.   To be able to allow multiple prompt operations for DTMF digit
collection
     (if supported), voice recording, (if supported), speech recognition
analysis (if supported) and/or speech verification/identification and
provide
the following types of prompts:
-    Initial Prompt
-    Reprompt
-    Error prompt
-    Failure announcement
-    Success announcement.
e.   Allow the specification of definable key sequences for digit
     recording (if supported) or speech recognition (if supported) in the
speech
verification/identification analysis to:
-    Discard speech verification/identification in analysis, replay the
prompt,
and resume analysis.
-    Discard speech verification/identification analysis in progress and
resume
analysis.
-    Terminate the current operation and return the terminating key
     sequence to the MGC.
f.   Provide a way to ask the ARF MG to support the following definable
keys for speech verification/identification analysis. These keys would
then
be able to be acted upon by the ARF MG:
-    A key to terminate playing of an announcement in progress.
-    A key that signals the end of user input.  The key may or may not
     be returned to the MGC along with the input already collected.
-    Keys to stop playing the current announcement and resume speech
verification/identification at
     the beginning of the first segment of the announcement, last seg-
     ment of the announcement, previous segment of the announcement,
     next segment of the announcement, or the current announcement seg-
     ment.
11.2.9.6.  Auditory Feature Extraction/Recognition Module
The auditory feature extraction/recognition module is engineered to
continuously
monitor the auditory stream for the appearance of particular auditory
signals or
speech utterances of interest and to report these events (and optionally a
signal feature representation of these events) to network servers or MGCs.
The Auditory Feature Extraction/Recognition Module must support the
following
requirements:
a. Be able to download application specific software to the ARF either
prior
to the session or in mid-session.
b. Be able to download parameters, such as a representation of the
auditory
feature to extract/recognize, for prior to the session or in mid-session.
c. Be able to return parameters indicating the auditory event found or a
representation of the feature found (i.e., auditory feature).
11.2.9.7.  Audio Conferencing Module
The protocol must support:
a.   a mechanism to create multi-point conferences of audio only and
     multimedia conferences in the MG.
b.   audio mixing; mixing multiple audio streams into a new composite
     audio stream
c.   audio switching; selection of incoming audio stream to be sent out
     to all conference participants.
--------------------------------------------------------------------------
Nancy M. Greene
Internet & Service Provider Networks, Nortel Networks
T:514-271-7221 (internal:ESN853-1077) E:ngreene@nortelnetworks.com

FW: reqs-07: updated 11.2.9 Audio Resource Function

Nancy-M Greene