FW: reqs-07: updated 11.2.9 Audio Resource Function

Nancy-M Greene ngreene at NORTELNETWORKS.COM
Sat Oct 16 16:18:35 EDT 1999


The following update is proposed to the Megaco/H.248 requirements document
(APC-1695). I am posting it to this group for information.

If anyone has any comments on this updated section, please send me an email
or speak to me in RedBank next week.

Nancy
--------------------------------------------------------------------------
Nancy M. Greene
Internet & Service Provider Networks, Nortel Networks
T:514-271-7221 (internal:ESN853-1077) E:ngreene at nortelnetworks.com

> ----------
> From:         Greene, Nancy-M [CAR:5N10:EXCH]
> Sent:         Saturday, October 16, 1999 4:15 PM
> To:   'megaco at standards.nortelnetworks.com'
> Subject:      reqs-07: updated 11.2.9 Audio Resource Function
>
> Below is proposed revised text for 11.2.9.  Audio Resource Function
> (formerly called IVR Unit). Michael Ramalho and David Cromwell have worked
> on the wording, and have agreed on requirements for the speaker
> verification/identification and auditory feature extraction/recognition
> sections.
>
> 11.2.9.1 Play Audio,
> 11.2.9.2 DTMF Collect,
> 11.2.9.3 Record Audio,
> 11.2.9.4 Speech Recognition,
> 11.2.9.5 Speaker Verification/Identification, and
> 11.2.9.6 Auditory Feature Extraction/Recognition,
> 11.2.9.7 Audio Conferencing.
>
> Summary of changes:
> - the word Audio Server was removed, and replaced by the more general term
> Audio Enabled Gateway
> - digit collection was qualified to be DTMF digit collection
> - In Section 11.2.9.1, parts d and e on sequences and sets reworked to
> eliminate the implementation specific language
> - In Section 11.2.9.1 parts f, g, h removed
> - reordering of sections, putting Audio Conferencing at the end
> - requirements added to sections 11.2.9.5 and 11.2.9.6
>
> If there are no comments by Tuesday or so, I will issue reqs-08 with this
> updated section in it. Sometime very soon I expect Tom to issue WG last
> call on this document.
>
> Thanks to Michael and David for their work on this.
>
> Nancy
>
> revised 11.2.9:
> 11.2.9.  Audio Resource Function
>
> An Audio Resource Function (ARF) consists of one or more functional
> modules which can be deployed on an stand alone media gateway server IVR,
> Intelligent Peripheral, speech/speaker recognition unit, etc. or a
> traditional media gateway.  Such a media gateway is known as an Audio
> Enabled
> Gateway (AEG) if it performs tasks defined in one or more of the following
>
> ARF functional modules:
>
>            Play Audio,
>            DTMF Collect,
>            Record Audio,
>            Speech Recognition,
>            Speaker Verification/Identification,
>            Auditory Feature Extraction/Recognition, or
>            Audio Conferencing.
>
>
>
> Additional ARF function modules that support human to machine
> communications
> through the use of telephony tones (e.g., DTMF) or auditory means (e.g.
> speech)
> may be appended to the AEG definition in future versions of these
> requirements.
>
> Generic scripting packages for any module must support all the
> requirements for
> that module. Any package extension for a given module must include, by
> inheritance or explicit reference, the requirements for that given module.
>
> The protocol requirements for each of the ARF modules are provided in the
> following subsections.
>
>
> 11.2.9.1.  Play Audio Module
>
> a.   Be able to provide the following basic operation:
>
> -    request an ARF MG to play an announcement.
>
> b.   Be able to specify these play characteristics:
>
> -    Play volume
>
> -    Play speed
>
> -    Play iterations
>
> -    Interval between play iterations
>
> -    Play duration
>
> c.   Permit the specification of voice variables such as DN, number, date,
>
>      time, etc.  The protocol must allow specification of both the value
>      (eg 234-3456), and well as the type (Directory number).
>
> d.   Using the terminology that a segment is a unit of playable speech, or
> is
>      an abstraction that is resolvable to a unit of playable speech,
> permit
>      specification of the following segment types:
>
> -    A provisioned recording.
>
> -    A block of text to be converted to speech.
>
> -    A block of text to be displayed on a device.
>
> -    A length of silence qualified by duration.
>
> -    An algorithmically generated tone.
>
> -    A voice variable, specified by type and value.  Given a variable
>      type and value, the IVR/ARF unit would dynamically assemble the
>      phrases required for its playback.
>
> -    An abstraction that represents a sequence of audio segments.  Nesting
> of
>      these abstractions must also be permitted.
>
>      An example of this abstraction is a sequence of audio segments, the
> first
>      of which is a recording of the words "The number you have dialed,"
>      followed by a Directory Number variable, followed by a recording of
> the
>      words "is no longer in service."
>
> -    An abstraction that represents a set of audio segments and which is
>      resolved to a single segment by a qualifier.  Nesting of these
>      abstractions must be permitted.
>
>      For example take a set of audio segments recorded in different
> languages
>      all of which express the semantic concept "The number you have dialed
> is
>      no longer in service."  The set is resolved by a language qualifier.
> If
>      the qualifier is "French," the set resolves to the French version of
> this
>      announcement.
>
>      In the case of a nested abstraction consisting of a set qualified by
>
>      language at one level and and a set qualified by gender at another
> level,
>      it would be possible to specify that an announcement be played in
> French
>      and spoken by a female voice.
>
> e.   Provide two different methods of audio specification:
>
> -    Direct specification of the audio components to be played by speci-
>      fying the sequence of segments in the command itself.
>
> -    Indirect specification of the audio components to be played by
>      reference to a single identifier that resolves to a provisioned
>      sequence of audio segments.
>
>
> 11.2.9.2.  DTMF Collect Module
>
> The DTMF Collect Module must support all of the requirements in the Play
> Module in addition to the following requirements:
>
> a.   Be able to provide the following basic operation:
>
>
> -    request an AEG to play an announcement, which may optionally
> terminated
>      by DTMF, and then collect DTMF
>
> b.   Be able to specify these event collection characteristics:
>
> -    The number of attempts to give the user to enter a valid DTMF pat-
>      tern.
>
> c.   With respect to digit timers, allow the specification of:
>
> -    Time allowed to enter the first digit.
>
> -    Time allowed for user to enter each digit subsequent to the first
>      digit.
>
> -    Time allowed for user to enter a digit once the maximum expected
>      number of digits has been entered.
>
> d.   To be able to allow multiple prompt operations for DTMF digit
> collection,
>      voice recording (if supported), and/or speech recognition analysis
> (if
> supported) provide the
>      following types of prompts:
>
> -    Initial Prompt
>
> -    Reprompt
>
> -    Error prompt
>
> -    Failure announcement
>
> -    Success announcement.
>
> e.   To allow digit pattern matching, allow the specification of:
>
> -    maximum number of digits to collect.
>
> -    minimum number of digits to collect.
>
> -    a digit pattern using a regular expression.
>
> f.   To allow digit buffer control, allow the specification of:
>
> -    Ability to clear digit buffer prior to playing initial prompt
>      (default is not to clear buffer).
>
> -    Default clearing of buffer following playing of un-interruptible
>      announcement segment.
>
> -    Default clearing of buffer before playing a re-prompt in response
>      to previous invalid input.
>
> g.   Provide a method to specify DTMF interruptibility on a per audio
>      segment basis.
>
> h.   Allow the specification of definable key sequences for digit col-
>      lection to:
>
> -    Discard collected digits in progress, replay the prompt, and resume
>      DTMF digit collection.
>
> -    Discard collected digits in progress and resume DTMF digit
> collection.
>
> -    Terminate the current operation and return the terminating key
>      sequence to the MGC.
>
> i.   Provide a way to ask the ARF MG to support the following definable
>      keys for DTMF digit collection and recording. These keys would then
> be
>      able to be acted upon by the ARF MG:
>
> -    A key to terminate playing of an announcement in progress.
>
> -    A set of one or more keys that can be accepted as the first digit
>      to be collected.
>
> -    A key that signals the end of user input.  The key may or may not
>      be returned to the MGC along with the input already collected.
>
> -    Keys to stop playing the current announcement and resume playing at
>      the beginning of the first segment of the announcement, last seg-
>      ment of the announcement, previous segment of the announcement,
>      next segment of the announcement, or the current announcement seg-
>      ment.
>
>
> 11.2.9.3.  Record Audio Module
>
> The Record Module must support all of the requirements in the Play
> Module as in addition to the following requirements:
>
> a.   Be able to provide the following basic operation:
>
>      request an AEG to play an announcement and then record voice.
>
> b.   Be able to specify these event collection characteristics:
>
> -    The number of attempts to give the user to make a recording.
>
> c.   With respect to recording timers, allow the specification of:
>
> -    Time to wait for the user to initially speak.
>
> -    The amount of silence necessary following the last speech segment
>      for the recording to be considered complete.
>
> -    The maximum allowable length of the recording  (not including pre-
>      and post-speech silence).
>
> d.   To be able to allow multiple prompt operations for DTMF digit
> collection
>      (if supported) voice recording, speech recognition analysis (if
> supported)
> and/or speech verification/identification (if supported) and then to
> provide the
>      following types of prompts:
>
> -    Initial Prompt
>
> -    Reprompt
>
> -    Error prompt
>
> -    Failure announcement
>
> -    Success announcement.
>
> e.   Allow the specification of definable key sequences for digit
>      recording or speech recognition analysis (if supported) to:
>
> -    Discard recording in progress, replay the prompt, and resume
>      recording.
>
> -    Discard recording in progress and resume recording.
>
> -    Terminate the current operation and return the terminating key
>      sequence to the MGC.
>
> f.   Provide a way to ask the ARF MG to support the following definable
>      keys for recording. These keys would then be able to be acted upon
>      by the ARF MG:
>
> -    A key to terminate playing of an announcement in progress.
>
> -    A key that signals the end of user input.  The key may or may not
>      be returned to the MGC along with the input already collected.
>
> -    Keys to stop playing the current announcement and resume playing at
>      the beginning of the first segment of the announcement, last seg-
>      ment of the announcement, previous segment of the announcement,
>      next segment of the announcement, or the current announcement seg-
>      ment.
>
> g.   While audio prompts are usually provisioned in IVR/ARF MGs, support
>      changing the provisioned prompts in a voice session rather than a
>      data session.  In particular, with respect to audio management:
>
> -    A method to replace provisioned audio with audio recorded during a
>      call. The newly recorded audio must be accessible using the iden-
>      tifier of the audio it replaces.
>
> -    A method to revert from replaced audio to the original provisioned
>      audio.
>
> -    A method to take audio recorded during a call and store it such
>      that it is accessible to the current call only through its own
>      newly created unique identifier.
>
> -    A method to take audio recorded during a call and store it such
>      that it is accessible to any subsequent call through its own newly
>      created identifier.
>
>
>
> 11.2.9.4.  Speech Recognition Module
>
> The speech recognition module can be used for a number of speech
> recognition
> applications, such as:
>
> Limited Vocabulary Isolated Speech Recognition (e.g., "yes", "no",
> the number "four"),
>
> Limited Vocabulary Continuous Speech Feature Recognition (e.g., the
> utterace "four hundred twenty-three dollars"),and/or
>
> Continuous Speech Recognition (e.g., unconstrained speech
> recognition tasks).
>
> The Speech Recognition Module must support all of the requirements in the
> Play
> Module as in addition to the following requirements:
>
> a.   Be able to provide the following basic operation:
>
>      request an AEG to play an announcement and then perform speech
> recognition
> analysis.
>
> b.   Be able to specify these event collection characteristics:
>
> -    The number of attempts to give to perform speech recognition task.
>
> c. With respect to speech recognition analysis timers, allow the
> specification
>     of:
>
> -    Time to wait for the user to initially speak.
>
> -    The amount of silence necessary following the last speech segment
>      for the speech recognition analysis segment to be considered
> complete.
>
> -    The maximum allowable length of the speech recognition analysis  (not
>
> including pre- and post- speech silence).
>
> d.   To be able to allow multiple prompt operations for DTMF digit
> collection
>      (if supported), voice recording (if supported), and/or speech
> recognition
> analysis and then to provide the following types of prompts:
>
> -    Initial Prompt
>
> -    Reprompt
>
> -    Error prompt
>
> -    Failure announcement
>
> -    Success announcement.
>
> e.   Allow the specification of definable key sequences for digit
>      recording (if supported) or speech recognition analysis to:
>
> -    Discard  in process analysis, replay the prompt, and resume
>      analysis.
>
> -    Discard recording in progress and resume analysis.
>
> -    Terminate the current operation and return the terminating key
>      sequence to the MGC.
>
> f.   Provide a way to ask the ARF MG to support the following definable
> keys for speech recognition analysis. These keys would then be able to be
>
> acted upon by the ARF MG:
>
> -    A key to terminate playing of an announcement in progress.
>
> -    A key that signals the end of user input.  The key may or may not
>      be returned to the MGC along with the input already collected.
>
> -    Keys to stop playing the current announcement and resume playing at
>      the beginning of the first segment of the announcement, last seg-
>      ment of the announcement, previous segment of the announcement,
>      next segment of the announcement, or the current announcement seg-
>      ment.
>
>
> 11.2.9.5.  Speaker Verification/Identification Module
>
> The speech verification/identification module returns parameters that
> indicate
> either the likelihood of the speaker to be the person that they claim to
> be
> (verification task) or the likelihood of the speaker being one of the
> persons
> contained in a set of previously characterized speakers (identification
> task).
>
> The Speaker Verification/Identification Module must support all of the
> requirements in the Play Module in addition to the following requirements:
>
> a. Be able to download parameters, such as speaker templates (verification
>
> task) or sets of potential speaker templates (identification task), either
>
> prior to the session or in mid-session.
>
> b. Be able to download application specific software to the ARF either
> prior
> to the session or in mid-session.
>
> c. Be able to return parameters indicating either the likelihood of the
> speaker to be the person that they claim to be (verification task) or the
> likelihood of the speaker being one of the persons contained in a set of
> previously characterized speakers (identification task).
>
> d.   Be able to provide the following basic operation:
>
>      request an AEG to play an announcement and then perform speech
> verification/identification analysis.
>
> e.   Be able to specify these event collection characteristics:
>
> - The number of attempts to give to perform speech
> verification/identification task.
>
>
> f.  With respect to speech verification/identification analysis timers,
> allow
> the specification of:
>
> -    Time to wait for the user to initially speak.
>
> -    The amount of silence necessary following the last speech segment
>      for the speech verification/identification analysis segment to be
> considered complete.
>
> -    The maximum allowable length of the speech
> verification/identification
> analysis  (not including pre- and post-speech silence).
>
> d.   To be able to allow multiple prompt operations for DTMF digit
> collection
>      (if supported), voice recording, (if supported), speech recognition
> analysis (if supported) and/or speech verification/identification and
> provide
> the following types of prompts:
>
> -    Initial Prompt
>
> -    Reprompt
>
> -    Error prompt
>
> -    Failure announcement
>
> -    Success announcement.
>
> e.   Allow the specification of definable key sequences for digit
>      recording (if supported) or speech recognition (if supported) in the
> speech
> verification/identification analysis to:
>
> -    Discard speech verification/identification in analysis, replay the
> prompt,
> and resume analysis.
>
> -    Discard speech verification/identification analysis in progress and
> resume
> analysis.
>
> -    Terminate the current operation and return the terminating key
>      sequence to the MGC.
>
> f.   Provide a way to ask the ARF MG to support the following definable
> keys for speech verification/identification analysis. These keys would
> then
> be able to be acted upon by the ARF MG:
>
> -    A key to terminate playing of an announcement in progress.
>
> -    A key that signals the end of user input.  The key may or may not
>      be returned to the MGC along with the input already collected.
>
> -    Keys to stop playing the current announcement and resume speech
> verification/identification at
>      the beginning of the first segment of the announcement, last seg-
>      ment of the announcement, previous segment of the announcement,
>      next segment of the announcement, or the current announcement seg-
>      ment.
>
>
> 11.2.9.6.  Auditory Feature Extraction/Recognition Module
>
> The auditory feature extraction/recognition module is engineered to
> continuously
> monitor the auditory stream for the appearance of particular auditory
> signals or
> speech utterances of interest and to report these events (and optionally a
>
> signal feature representation of these events) to network servers or MGCs.
>
> The Auditory Feature Extraction/Recognition Module must support the
> following
> requirements:
>
> a. Be able to download application specific software to the ARF either
> prior
> to the session or in mid-session.
>
> b. Be able to download parameters, such as a representation of the
> auditory
> feature to extract/recognize, for prior to the session or in mid-session.
>
> c. Be able to return parameters indicating the auditory event found or a
> representation of the feature found (i.e., auditory feature).
>
>
> 11.2.9.7.  Audio Conferencing Module
>
> The protocol must support:
>
> a.   a mechanism to create multi-point conferences of audio only and
>      multimedia conferences in the MG.
>
> b.   audio mixing; mixing multiple audio streams into a new composite
>      audio stream
>
> c.   audio switching; selection of incoming audio stream to be sent out
>      to all conference participants.
> --------------------------------------------------------------------------
> Nancy M. Greene
> Internet & Service Provider Networks, Nortel Networks
> T:514-271-7221 (internal:ESN853-1077) E:ngreene at nortelnetworks.com
>



More information about the sg16-avd mailing list