The Case for Mono-media Contexts

The Case for Mono-media Contexts

Brian Rosen, FORE Systems

With comments from John Segers, Lucent Technologies

The current MEGACO/H.GCP model defines a context as carrying a single stream of data. The requirements for the protocol state that it must support multimedia. The question is how to do that.

The fundamental requirement for the protocol is to provide unambiguous instructions from the MGC to the MG to describe how all the connections are made. We have further agreed that single stream voice only media gateways will predominate, and therefore support for multimedia should not complicate the mono-media case.

Well, multimedia support is a requirement. So some overhead is acceptable. I agree that the overhead should be minimal.

On the one hand, it is important to note that there are only two reasons espoused for the MG needing to understand relationships between separate streams in a conference:

The necessity to time-synchronize streams (the lip-sync problem).
Multiplex relationships may exist. These come in two circumstances, both of which may exist in a single implementation, such as a decomposed H.320 gateway:

Multiple bearers may be multiplexed into a single data stream
Multiple media streams may be de-multiplexed from a single data stream.

There are no other requirements for MGs to know that multiple streams have any relationship between them when handling multimedia. While it may appear attractive to promote the concept of a conference with multiple streams as a first class construct in the protocol, the problems are synchronization and multiplexing, and if they are solved satisfactorily, then the requirements for multimedia are met.

Yes, but it would certainly be convenient to, for instance, delete an entire multimedia session with one command (delete context).

On the other hand, for arbitrary gateways there may be more complex relationships between multiple streams that are not covered by the problems multimedia streams present. If we create protocol mechanisms for linking streams that only solve the simple multi-media problem, we will have to create other mechanisms to solve a more general case. An example could be a complex audio-only conference bridge. A more difficult problem is an MCU, which may have many video, audio and data streams coming in, and several going out.

What is your point? Are you implying that multimedia contexts cannot handle complex things, and that an approach based on single-medium contexts can? I’d like to see a concrete example of something that cannot be modeled by means of a multimedia context and that can be modeled by means of single-medium contexts.

The simplicity of one Context = one stream (albeit the stream may be combined and transformed in the middle) is very attractive:

Connections are explicit – the MG knows exactly how to make connections because all Terminations in a context are connected in the single stream.

Same in multimedia context approach (use MediaID to identify media streams that belong together).

The simple case of mono-media is the most optimum use of the protocol

The most frequently used mode maybe, but most optimum?

The limitation with the current model is that with a single stream per Context, Terminations have a single "port". The problem here is that we cannot model more complex functions such as the multiplexer. These devices, whether they are realized physically, or instantiated in software on a DSP, have multiple ports, and not all ports have the same function. To make more complex gateways, we must have a way to model multiple-port devices and specify the connections on the ports.

Agreed! In the multimedia context approach you can do that by having what I call MediaIDs to identify what you call ports.

The simplest enhancement to the model to support multiple media as well as more complex functions is to expand the concept of "Termination" to have multiple Ports. A Port on a Termination is in a Context, as before. Thus there are multiple Contexts to support more complex functionality. This is very natural – the MG must create independent streams, and the connections for each of these streams must be specified. One stream = one Context. The current rule that a Termination can only be in one Context at any time is simply modified so that a Port of a Termination can be in one Context at any time.

I disagree with the statement that this is the simplest enhancement. What about a termination now, is that still part of one context? Why is having multiple contexts for one session simple? What complex functionality does it support that is not supported by having multimedia contexts?

We need a way to identify the Ports of a Termination. We have proposed that they be components the TerminationID, but many have objected to creating name significance.

And rightfully so! Why do you want to drag these long names along in every command you apply to a Termination?

The next simplest alternative is to create a PortID which would optionally follow a Termination ID in the commands. In many cases, Terminations could have a variable number of Ports. For example, the bearers in a multiplexer could range from 2 to an arbitrarily large number (an entire T1 for example). Therefore, we can endow PortIDs with the same semantics of TerminationIDs, allow the names to be hierarchical, and allow them to be under-specified.

Let us see how this would work with the H320 gateway. We create a Termination Class H320. It has 4 Ports: Bearer, Video, Audio and T120. If an H320 gateway implemented a fixed number of H320 connections, each could be a named Termination in the gateway. A DSP implementation may have a variable number of H320 gateway instantiations it could accomplish depending on bit rates and codecs, etc. H320 could in those cases be defined as ephemeral. Such a case is used in the following example:

Sorry, the picture got lost in the process of reformatting as html.

This looks rather complicated to me: there are terminations that are part of a context (such as the DS0/x), there are ports that are part of a context (such as the Bearer/x, Video, …), and there is a termination that isn’t part of anything (H.320).

A context is created for the first DS0 in the mux

Request: ContextID Any
Add DS0/0(properties of DS0), H320/Any.Bearer/Any(properties of bearer)

Reply: ContextID C1 Add DS0/0, H320/1.Bearer/0

Not shown are signals and events that result in additional phone number exchanges, etc required to create the calls for the rest of the bearer.

The second bearer call arrives, another context is created for it

Request: Context Any
Add DS0/1(properties of DS0), H320/1.Bearer/Any (properties of bearer))

Reply: Context C2 Add DS0/1, H320/1.Bearer/1

Additional bearers are created in the same manner.

The RTP connections are established:

Request: Context Any
Add H320/1.Video(properties of video),RTP/Any(properties of RTP)

Reply: Context C3 Add H320/1.Video, RTP/0

Request: Context Any
Add H320/1.Audio(properties of audio),RTP/Any(properties of RTP)

Reply: Context C4 Add H320/1.Audio, RTP/1

Request: Context Any
Add H320/1.T120(properties of t120),RTP/Any(properties of RTP)

Reply: Context C4 Add H320/1.T120, RTP/2

Clearly there are more contexts and one Termination has become a collection of Ports. This does result in a slightly less efficient protocol for an H320 gateway compared to multimedia context proposals. However:

The connection semantics are explicit, and very clear

No gain over multimedia context approach.

The model is very simple, and only incrementally different from the current model, which has undergone scrutiny for some time

Well, the simplicity is a matter of taste, I guess. Personally, I find the fact that some terminations occur in contexts and other do not, and the fact that a context may contain both ports and terminations complicated.

The model is optimized for audio-only gateways

Consider the call flow for a simple audio call in multimedia context approach: it is no more complicated that it was in the old monomedia model. The only extension is the presence of a MediaID.

You haven’t talked about synchronization between contexts yet. Or between streams, or whatever you need. This adds complexity to the single-medium context approach.

The semantics of the current model specify that the TerminationDescriptors describe what the media flow looks like (G.711, H261). Therefore, there is no need to describe the kind of media that is present in the context (notwithstanding that if SDP is used as the TerminationDescriptors, there is such a media identifier).

The proposed mechanism also permits arbitrarily complex Terminations to be created, with any kind of Port specialization needed, and any kind of relationship between flows necessary.

Please explain how to express relationships between flows then.

While not absolutely required, it may be interesting to allow multiple levels of PortIDs. Consider a multimedia MCU. The Termination Class could describe a "MCU/2.Participant/13.Video" Port so that each participant’s streams could be identified and manipulated (via properties) independent of other participants. Each could have a different video stream, audio stream and data stream if the MCU had choices that were per participant selected.

The same can be done using multimedia contexts, if only you introduce a UserID or something like that. In fact, this has already been proposed on the megaco list by Paul Sijben (maybe two months ago, when the discussion on multimedia support in H.gcp/megaco started).

Comparing this proposal to the latest Multimedia context proposal:

Multi-media Context: Terminations are not identified with bearer channels, they reference the bearer channels on which media is sent and/or received

Mono-media Context: The bearer is a Termination Yes, but not every Termination is a bearer (the H.320 Termination is not a bearer, for instance).

Multi-media Context: Terminations are always ephemeral. Temporary Terminations must be created to have Events, Signals and Properties of bearers outside of a Context Yes, but that is low overhead: if you just do audio-only calls, it is done at power-up only. After that, the ephemeral Terminations are moved from the null Context to real Contexts and vice versa.

Mono-media Context: Physical Terminations are permanent. Synthesized Terminations are ephemeral. Physical Terminations can have Events, Signals and Properties outside of a Context. Technically speaking: in the null Context.

Multi-media Context: Contexts are multiple media. Connections between contexts are implied by the media parameter and bearer specifier This seems to indicate misunderstanding of the multimedia context approach: there is no need to have connections between contexts when you have multimedia contexts.

Mono-media Context: Contexts are single media. Connections are explicit in the Context

Multi-media Context: "media" parameter specifies Termination type, function and/or flow type depending on circumstances (media="h221", media="video"). No, there are MUX descriptors (and there is a type H.221 MUX descriptor), there are Media descriptors (and there is no need for a type, codec implies media type).

Mono-media Context: Termination type is permanently assigned to the Termination and auditable, function is designated by PortID, flow type is identified by TerminationDescriptors.

Multi-media Context: Arbitrary TerminationID is assigned by MG when added to Context. Name has no structure.

Mono-media Context: TerminationID is provisioned or built in to MG, possibly hierarchical. Full TerminationIDs for ephemeral Terminations are assigned when the Termination is instantiated. I regard this more complex naming scheme a disadvantage of the single-medium context approach.

Multi-media Context: Terminations have multiple bearers and multiple streams. No other differentiation of flows/ports is specified No, this is in fact not true. There is a clear need to identify media streams, and using a MediaID achieves this. If you use H.245 for opening logical channels, the MGC could reuse the LogicalChannelNumber for MediaID.

Mono-media Context: Terminations have multiple Ports, which could be any flow which must be distinguished from other flows.

Multi-media Context: Type of a Termination may change mid call, no movement is required. Actually, I’ve come to the conclusion that termination type is not a useful concept. Terminations are specified by a bunch of descriptors (bearer descriptor, mux descriptor, modem descriptor, media descriptor, signal descriptor, events descriptor, …). The media descriptor may change during a call. Two examples: 1) video being added to an existing audio call (H.320), 2) changing from audio to fax during a call. Other descriptors may change too, although I see no use for changing the bearer type from DS0 to RTP.

Mono-media Context: Termination types are fixed, Terminations may be moved to different contexts.

In conclusion, we argue that a simple addition of PortID to the current syntax achieves the goals of supporting multimedia as well as more complex gateway functionality at the cost of a modest additional complexity in a multimedia setup. Compared to the multimedia context proposals, the PortID proposal is simpler, and covers more cases.

I’m sorry, but I don’t share this conclusion (see my comments to the comparison).

Another argument for having multimedia contexts is the possibility of applications where media streams are transformed from one media type to another. For instance text to speech or speech to text. This can be modeled easily with multimedia contexts. I do not see how to do it using single-medium contexts.