Tim,
I am definitely giving consideration to using a modified Annex E/H.323.
Taking a broader view, I think that a connectionless approach is the right solution. I am considering both how Annex E or SCTP might satisfy the requirements .
I am personally not at all in favor of the status inquiry/status message exchange. I have been working out the details of what I would like to see and I have these goals: 1) Keep the transport and applications layers as separate as possible 2) Allow any entity to detect "next hop" failures and re-route 3) Only require that the two endpoints in the call maintain the call state and bring each other "up to speed" once the connection has been re-established. 4) Introduce a minimum amount of complexity-- complexity spells failure as far as I am concerned
I do not want any entity in the middle of the network to have to maintain any state, though there will be messages queued for re-transmission at the transport layer on those entities (something I don't think people would object to).
I believe I have the beginnings of a solution worked out that will allow one or more network entities to fail and we can still recover the call. This would, of course, require that all entities in the call signaling path support the robustness procedure to work.
However, my solution would also allow intermediate nodes to act as the terminating point of the "robustness chain" should part of the signaling path support robustness and part does not. For example, if NetMeeting were used and a "robust" routed GK handled the call, the GK could handle the robustness procedure on behalf of NetMeeting. Unfortunately, trying to support non-robust entities is problematic. For example, if a non-robust entity sends a message and there is a failed signaling point, it may take some time to re-establish the link and transmit the message. During that time, the non-robust endpoint may assume the call has failed-- certainly, timers on some messages may expire and the endpoint may terminate the call. For this reason, I would say that support for non-robust entities would be "best effort".
With that said, I think we can work out a solution that fully supports the re-establishment of multipoint, audio/video conferences that have one or multiple points of failure without introducing significant complication to the endpoints. Annex E may be a very good base upon which to build that solution and I am examining it closely. That is not to say that I have dismissed SCTP, though.
Best Regards, Paul
----- Original Message ----- From: "Tim Chen" scc@TRILLIUM.COM To: ITU-SG16@mailbag.cps.intel.com Sent: Wednesday, April 05, 2000 11:50 PM Subject: [Robustness] Modifying Annex E to support Robustness?
Hi,
There are a couple of thoughts we have regarding H.323 robustness:
- As specified in TD87 from the Geneva meeting,
"status inquiry" and "status" messages are used to resync call states. This may lead to a tremendous amount of load at the gatekeeper. A gatekeeper routed call requires 2 status inquiries and 2 status messages, plus the messages needed to re-establish the TCP connections. For a gatekeeper handling several thousand calls, this is a lot of messages.
- Besides considering SCTP, have we considered
modifying Annex E procedures to achieve robustness at network interface? If a NIC on a gatekeeper fails, endpoints trying to contact the gatekeeper could use a recovery address on another NIC to resend their messages. Is there any reason why this cannot be done or is undesirable? Such approach avoids invoking the procedures mentioned in TD87 to re-synchronize call states and to re-establish TCP connections. Since we are proposing a new set of procedures for a 323 entity to support robustness, backward compatibility doesn't seem like a problem.
Thanks.
Regards, Tim
Tim Chen Trillium Digital Systems, Inc. Phone: +1 310 442 9222 Fax: +1 310 442 1162 email: tim@trillium.com http://www.trillium.com