Tom,
I agree with your analysis of requirements. We need to support multuimedia conference, yet we should not loose sight that H.320 is only a very small fraction of the initial market, and that we should not unduly increase the complexity of an audio only system. I also agree that we have to support multicast, and that we have to provide for fancy control of (at least some) conferences.
The architecture we had in mind, as Brian explained, is to have one context per media. This brings in the notion of a "context type" -- an audio bridge, a video bridge, a T.120 bridge all behave somewhat differently. In fact, the particular realm of video bridging is prone to divisions such as "single image", "mosaic", "incrustations", which could be defined as context level parameters. In fact, a common practice on the MBONE is to carry several video streams on the same RTP association, using the SSRC as the stream identifier, and letting individual recipients select which video is displayed and which one is iconized.
Short of that, conference control does not appear to be a real problem with the current spec. The mode parameter allows the MGC to specify the terminations that "contribute" to the mix (allowed to send) and the ones that only receive information. In a video bridge, the contributing terminations are sent in the mosaic, or multiplexed as RTP streams. In a T.120 bridge, the "master" termination is authorized to contribute, the other one remain passive. Synchronization of multiple context can be performed by the MGC: event reporting allows it to detect the last speaker, or video activity, or T.120 activity, and to issue MODIFY command that manage the conference accordingly. If the event/command chain is deemed to slow, then we will have to resort to a local script, a surrogate for the MGC.
But there is more.
Support for H.320 requires, in particular, the support for H.221 multiplexing of several streams onto a single carrier. That support is somewhat similar to the support of several DS0 inside a single DS1, or several B-Channels, albeit there are some obvious differences -- multiplexing parameters vary on a call per call basis, or even within a call. We have to express the notion that a specific termination, for the duration of a call, is "exploded" into a set of ephemeral terminations, each of which is then connected to a media specific context. This requires an additional mechanism, something like a 'demultiplexing context'.
Demultiplexing has a symmetric, inverse multiplexing. This happens when an H.320 conference spans several B channels. It can also happen for data calls.
In order to at least obtain the "warm and fuzzy feeling" that Fernando mentions, we have at a minimum to sketch out our solution to inverse multiplexing and demultiplexing, and maybe to propose one way to handle H.320.
-- Christian Huitema ------------------------------ Please note my new address: huitema@research.telcordia.com http://www.telcordia.com/