Over the past couple of days there has been some interesting discussion on the mailing list with respect to multimedia support in H.gcp/MEGACO, starting with Tom Taylor's note. Unfortunately I have been off-line and have not been able to respond until now.
Tom indicated that he would not pursue an approach in which a multimedia session is made up of one single context. The discussion that followed did not explore this option either. I would like to take the opportunity to look into this, and compare this approach to the proposals that have been posted earlier.
Let me first state the main reason why I favor an approach in which there is just one context for a multimedia, multi-party session: simplicity of the model. In my opinion having multiple contexts for mux/demux, audio, video, data adds too much detail to the connection model. It starts to resemble the first MDCP draft where there were separate entities in the model for echo canceling, bridging, transcoding, etc. I think we can all agree that there was too much detail there. It is much better to set properties of terminations (or edge points as we used to call them in MDCP).
So the connection model should be as abstract and flexible as possible, keeping in mind that the 95% use case is point to point voice. To me this means that a model for a multimedia call does not need separate entities in the model for mux/demux, the audio, video and data flows that are sourced/sinked by the mux/demux. To me it also means that there is no need to have the conference bridge split up over different contexts. It seems better to have one context describing the bridging functionality, and leave it to MG implementers to decide which hardware/software units to use in the box.
Now let me get to the ingredients of the model that I propose for multimedia support. There are only ephemeral terminations. Every termination has a list of bearer channels it uses. A termination describes the (de)multiplexing rule used in case media are multiplexed on the bearer(s). Examples are H.221 terminations, H.223 terminations, and terminations that use layered H.263 video encoding with the multiple layers going to/coming from different UDP ports. As Tom noted in his message, this can be seen as a shorthand notation for the separate context that indicates the (de)multiplexing. My point is that we don't need the context if we can do with the termination describing the (de)multiplexing. And having only one context for the multimedia session, there is no need to repeat the list of bearers in multple contexts.
+----------+ --- +----------+ | | audio / \ audio | audio on | --| H.221 |--------| |--------| RTP |-- | termina- | video | b f | | | --| tion |--------| r u | +----------+ | | data | i n | | |--------| d c | +----------+ +----------+ | g t | video | video on | | e i |--------| RTP |-- +----------+ | o | | | | | audio | n | +----------+ | H.223 |--------| | --| termina- | video | | +----------+ | tion |--------| | data | data on | | | data | |--------| T.120 |-- | |--------| | | | +----------+ \ / +----------+ ---
The bridging functionality is described by means of properties of the context. An identification of the streams that are linked to one user would be helpful to be able to easily specify that all users should receive the video stream of the last active speaker.
The impact on protocol syntax is small. Manipulating a session amounts to adding/deleting/modifying terminations as before, no extra commands are needed. There is a need to set the bridging properties, so we need a mechanism to set context properties. In the other approaches there is a need to do so as well.
Another advantage of this approach is the use of ephemeral terminatinos. The MG can assign simple names to these (in the same way it assigns names to contexts). Thus there is no need any more to have the long hierarchical names included in all commands that reference terminations. (I see no good reason for having termination names containing any information about the type of transport and/or media used in the termination. This information is present in the termination properties already.)
Final remark on this approach: I think that this is actually what Tom alluded to as Paul Sijben's approach (I didn't check with Paul, I have to admit).
Now for a couple of comments on the earlier mails. It seems overly complex to me to have multiple ephemeral instances of mux contexts/terminations around as Brian Rosen suggests. The approach I outlined above only has one ephemeral termination for the media multiplexed for transport, which looks much simpler. It is as flexible as the idea Brian presented and does not need the semantics that all ephemeral instances of a mux are created when the first one is, nor do you need multiple actions because you don't have to deal with multiple contexts.
Fernando Cuervo suggested having terminations imply the (de)multiplexing requires that it is possible to describe in every termination which bits from which packet/frame are to be sent out because a media stream may be split up over multiple. So I feel it would be much better to have the mux explicit. An advantage of having the (de)multiplixing in one termination/context is that it is immediately clear which bearers are used for the aggregated stream. What I like about Fernando's proposal is the fact that there is a context type. What may even be better (I haven't thought it through, though) is to have a limited number of context profiles: one for voice-only with all streams going to everyone but the sender, video+audio with audio to everyone but the sender, speaker's video to everyone, etc.
Kindest regards,
John Segers -- John Segers email: jsegers@lucent.com Lucent Technologies Room HE 344 Dept. Forward Looking Work phone: +31 35 687 4724 P.O. Box 18, 1270 AA Huizen fax: +31 35 687 5954