Brian, I'll attempt to show an apples to apples comparison of your H.320 model with two DS0s and three IP streams versus John's model with the same. I will do this via attached Powerpoint slide to keep myself from screwing with text formatting all afternoon. The points I have are illustrated on the slide, but I will summarize below. Brian's Model: 1) Five separate Contexts 2) One new Context type (i.e.-MUX; to hold the DS0 and MUX Terminations; could be a Context property as opposed to a new type?) 3) Five new Termination types (MUX, plus H.320 audio, video, data and DS0) 4) Linkage between Contexts is required and is provided via a new Termination property called the "mux ephermal instance identifier" 5) DS0 Terminations exist as separate entities representing physical terminations 6) Multiplex is described with a new Context type and new Termination types, as well as Termination properties required to link Contexts John's Model: 1) One context 2) One new Context Type (i.e.-multimedia; could be a Context property as opposed to a new type?) 3) One new Termination type (H.320) 4) No linkage between Contexts is required 5) DS0 Termination replaced by logical Termination that refers to the physical DS0 6) Multiplex is described with a new Termination type which must refer to the physical DS0s it uses I hope I have not misrepresented either your or John's ideas. I think John's looks quite a bit simpler, but would encourage you guys to maybe put together a little white paper to describe each of your models and to argue your points. This e-mail thread has gotten too hard to follow. May I suggest putting out a Word document with illustrative diagrams? A call flow describing the setup of the H.320 call would be nice to include also so we can see the effects on the model at various points in the setup. Rex Brian Rosen wrote:
John
I don't think you have achieved what you stated your goals were by proposing that there is only one context, and the MG sorts out what to do by the properties of the terminations.
For one thing, the picture you are advocating is:
+------------------------------------+ | Context C1 | | | | DS0/0 -----\ /---- RTP/101 | | \ / | | DS0/1 -------- o ------- RTP/102 | | / | \ | | H.221/0 ----/ | \---- RTP/103 | | | | | H.233/0 | +------------------------------------+
Not a pretty picture, even if all the terminations have enough properties to allow the gateway to figure out what is intended. You are really into the "chocolate chip cookie model" where you just throw all kinds of different termination types into one context and let them sort themselves out.
You also don't like having the MUX visible, but you do want the difference between H.221 and H.223 visible. I don't get it; the MGC doesn't care about the differences, and the MG doesn't need to keep them separate with respect to the MGC. You might as well just have more properties in the Context to deal with the H.221 and H.223 stuff, and not have H.221 and H.223 ephemerals at all. At least by making the mux visible, we deal with the issue that the DS0s are on one "side" of the process with the RTP on the other. I could have drawn the picture above: +------------------------------------+ | Context C1 | | | | DS0/0 -----\ /---- RTP/101 | | \ / | | RTP/102 ------ o ------- H.223/0 | | / | \ | | H.221/0 ----/ | \---- DS0/1 | | | | | RTP/103 | +------------------------------------+ Since there is no natural way to sort out the relationships between the external terminations, they are all in one big hairball. An even worse picture.
Making lots of properties of terminations is always possible. It's not going to make the number of operations or the size or number of messages much different. All of the proposals end up having similar sized messages to get the job done, primarily because the operations are based on the external terminations. The differences are in the number of Actions, the number of commands per Action, and the number of properties per command but the product isn't much different for any of the cases. You have smaller numbers of terminations and contexts, but larger numbers of properties. Message sizes should be about the same. There also isn't any great differences I can see in what the MG has to do in order to figure out how to accomplish what it is asked.
One large problem is how you specify which RTP flow is which. Since all you have to help you is properties, we would have to add properties to RTP that would specify which media was being asked for on that RTP stream. When you add a new context type, you may have to extend the RTP termination class; very undesirable. This of course extends to all the "packet/cell" termination types -- each of them would have to be extended with the new media types.
Finally, you are making a complex case "simple"; H.320 gateways are not the focus of the effort, but are in scope. Support for such uses should be possible, not necessarily optimal. By making contexts multimedia, you complicate the simple cases of single media gateways; not a good thing.
So far, we have not needed any parameters to Action other than ContextID. If we really need them, we should add them. I don't want to go there unless we have a good reason. Making H320 gateways simpler is not a good reason IMO.
I guess the bottom line is that the current model of one media per context, no context parameters, works for all the really primary cases (Access Gateways, Tandems, residential gateways, NASs, etc.), and it extends, perhaps less elegantly than you would like, to corner cases like decomposed H320 gateways. We have to have a really good reason to add more stuff to the model.
While I like my model "best", there is a case to be made for ContextType - I think it's more like "combination rule". The audio rule is "mixing bridge". That clearly does not work for any kind of other media, and doesn't cover all the cases of audio (although it covers 95+% cases!). If there is any change to be made, I think it would be that there is a single, optional parameter to Action which is the combination rule. The default is mixing bridge. That still leaves us with how to handle terminations that are different "types" going into the rule. Properties on terminations is one way, implied naming conventions is another, extra termination classes is another.
Similarly, there must be a way to relate multiple streams in a conference. Single context is one way, ConferenceID, either as a parameter to an Action or a property of a termination is another, and implied naming conventions is a third.
We can back our way into two parameters on the Action, or we can use one mechanism for stream ID and another for CallID, or we can use one mechanism for both. Implied semantics in TerminationIDs serves both purposes, so I favor that.
Brian
-----Original Message----- From: John Segers [mailto:jsegers@LUCENT.COM] Sent: Wednesday, April 28, 1999 1:44 PM To: ITU-SG16@mailbag.cps.intel.com Subject: Multimedia contexts for H.320 and H.324 support in megaco/H.gcp
Over the past couple of days there has been some interesting discussion on the mailing list with respect to multimedia support in H.gcp/MEGACO, starting with Tom Taylor's note. Unfortunately I have been off-line and have not been able to respond until now.
Tom indicated that he would not pursue an approach in which a multimedia session is made up of one single context. The discussion that followed did not explore this option either. I would like to take the opportunity to look into this, and compare this approach to the proposals that have been posted earlier.
Let me first state the main reason why I favor an approach in which there is just one context for a multimedia, multi-party session: simplicity of the model. In my opinion having multiple contexts for mux/demux, audio, video, data adds too much detail to the connection model. It starts to resemble the first MDCP draft where there were separate entities in the model for echo canceling, bridging, transcoding, etc. I think we can all agree that there was too much detail there. It is much better to set properties of terminations (or edge points as we used to call them in MDCP).
So the connection model should be as abstract and flexible as possible, keeping in mind that the 95% use case is point to point voice. To me this means that a model for a multimedia call does not need separate entities in the model for mux/demux, the audio, video and data flows that are sourced/sinked by the mux/demux. To me it also means that there is no need to have the conference bridge split up over different contexts. It seems better to have one context describing the bridging functionality, and leave it to MG implementers to decide which hardware/software units to use in the box.
Now let me get to the ingredients of the model that I propose for multimedia support. There are only ephemeral terminations. Every termination has a list of bearer channels it uses. A termination describes the (de)multiplexing rule used in case media are multiplexed on the bearer(s). Examples are H.221 terminations, H.223 terminations, and terminations that use layered H.263 video encoding with the multiple layers going to/coming from different UDP ports. As Tom noted in his message, this can be seen as a shorthand notation for the separate context that indicates the (de)multiplexing. My point is that we don't need the context if we can do with the termination describing the (de)multiplexing. And having only one context for the multimedia session, there is no need to repeat the list of bearers in multple contexts.
+----------+ --- +----------+ | | audio / \ audio | audio on | --| H.221 |--------| |--------| RTP |-- | termina- | video | b f | | | --| tion |--------| r u | +----------+ | | data | i n | | |--------| d c | +----------+ +----------+ | g t | video | video on | | e i |--------| RTP |-- +----------+ | o | | | | | audio | n | +----------+ | H.223 |--------| | --| termina- | video | | +----------+ | tion |--------| | data | data on | | | data | |--------| T.120 |-- | |--------| | | | +----------+ \ / +----------+ ---
The bridging functionality is described by means of properties of the context. An identification of the streams that are linked to one user would be helpful to be able to easily specify that all users should receive the video stream of the last active speaker.
The impact on protocol syntax is small. Manipulating a session amounts to adding/deleting/modifying terminations as before, no extra commands are needed. There is a need to set the bridging properties, so we need a mechanism to set context properties. In the other approaches there is a need to do so as well.
Another advantage of this approach is the use of ephemeral terminatinos. The MG can assign simple names to these (in the same way it assigns names to contexts). Thus there is no need any more to have the long hierarchical names included in all commands that reference terminations. (I see no good reason for having termination names containing any information about the type of transport and/or media used in the termination. This information is present in the termination properties already.)
Final remark on this approach: I think that this is actually what Tom alluded to as Paul Sijben's approach (I didn't check with Paul, I have to admit).
Now for a couple of comments on the earlier mails. It seems overly complex to me to have multiple ephemeral instances of mux contexts/terminations around as Brian Rosen suggests. The approach I outlined above only has one ephemeral termination for the media multiplexed for transport, which looks much simpler. It is as flexible as the idea Brian presented and does not need the semantics that all ephemeral instances of a mux are created when the first one is, nor do you need multiple actions because you don't have to deal with multiple contexts.
Fernando Cuervo suggested having terminations imply the (de)multiplexing requires that it is possible to describe in every termination which bits from which packet/frame are to be sent out because a media stream may be split up over multiple. So I feel it would be much better to have the mux explicit. An advantage of having the (de)multiplixing in one termination/context is that it is immediately clear which bearers are used for the aggregated stream. What I like about Fernando's proposal is the fact that there is a context type. What may even be better (I haven't thought it through, though) is to have a limited number of context profiles: one for voice-only with all streams going to everyone but the sender, video+audio with audio to everyone but the sender, speaker's video to everyone, etc.
Kindest regards,
John Segers -- John Segers email: jsegers@lucent.com Lucent Technologies Room HE 344 Dept. Forward Looking Work phone: +31 35 687 4724 P.O. Box 18, 1270 AA Huizen fax: +31 35 687 5954