[Robustness] Some Issues
I got no one on the call today - perhaps because of the late notice. Let's try addressing some issues by email.
Sec 8.4 discusses H.245 Connection Reestablishment. I think there are a few small issues.
1. While multiple calls can be signaled over a single CallSig connection, I do not believe that this is permitted on H.245 Call Control Signaling channels. So we do not have to be concerned with multiple calls when H.245 fails, right?
2. Assume CallSig channel failed and we have a separate H.245 channel which also failed (tunneled H.245 does NOT have this issue). We say, send "endSessionCommand" on the H.245 channel and then drop the connection. Then reestablish CallSig, send Facility with our h245addr. The other end reestablishes H.245 channel to us.
issue 2a. If the endSessionCommand makes it through (we detected failure but it wasn't completely dead?), won't the other end believe that the call is over, hangup, send us ReleaseComplete, etc. so we won't be able to recover? Before these procedures, loss of H.245 ended the call. Do we need a flag in endSessionCommand to indicate that we are "replacing" the channel rather than ending the call? Or can we omit the endSessionCommand - just drop the connection that we believe is dead. The Facility command will tell the other end to re-establish, dropping its end if necessary.
issue 2b. How do the two entities with the new h.245 channel know that this is a replacement and NOT do to initialization procedures (master/slave, capabilities exchange)? The end sending Facility knows and could remember this when the channel is established TO it from the other end. How does the end receiving Facility, distingish this from case where a new channel is needed (transition from fastStart or tunnelling)? By the fact that it has or had a distinct h.245 channel for that call already? Is this good enough, or do we need a flag in Facility or a Reason to indicate "RE-establishment"?
3. Assume that CallSig has NOT failed but separate H.245 has failed. This case is not clearly covered by the current wording in the paragraph, but I think the issues are the same. We either send endSessionCommand before dropping the "dead" connection or not as we decided in issue 2a. and then use Facility over the unfailed CallSig channel to re-establish. I will change the wording to clearly cover this case as well.
Section 6.4.1. We currently specify that if a CallSig connection fails we re-establish to the BACKUP transport address. There is some chance that the failure does NOT prevent a new connection to the original entity, e.g., failure of a socket listener process or a temporary failure. Recovering the call is easier in the original entity than in the backup since the call state may still be locally available. It would be nice to have a way to try the original entity before trying the backup. But we do not want to wait for the timeout of a TCP connection attempt. Should we consider or at least permit (make optional) a mechanism to probe the original entity with something like ping to test for network connectivity and operating system being alive first and if successful attempt re-establishing TCP to original entity BEFORE trying backup?
Section 6.4.3 closing old tcp connections. My notes imply that someone on a previous call may have had a problem with this section, but I have lost who or what the problem was. If someone has a problem with it, repeat the issue.
Section 6.3 Editor's note at the end. This describes a case that we discussed in a previous call. The goal was to reduce the number of KeepAlive messages for multiple calls between the same two entities. This is simple for the case where the calls are signaled on a single multiplexed TCP connection (and we discuss that). The issue is to use a single KeepAlive even when there is more than one TCP connection used. The problem we have is to identify all the calls that are truely between the same two entities. Analysis of special cases lead us to conclude that this could NOT be done without adding an additional globallyUniqueId to label the calls that were "clustered" or related to one KeepAlive exchange. The issue was whether a solution requiring an additional id field is worth the benefit.
Another solution that I do not believe we considered is for the KeepAlive message to add a field that carries some id of the other channels it applies to, but it is not clear in this case either, how to designate the channels. TransportAddress is not sufficiently unique, nor is the IPAddr of the establishing end. Since multiple calls may be on the channel, one cannot use callId or CRV. So we still have a problem requiring some new globallyUniqueId (note that is must be globally unique, since the connection may well be between two zones).
I'd like to hear from folks that think this is worth solving even if it requires maintenance of a new Id or any new solution that might avoid it.
-- ------------------------------------------------------------ Terry L Anderson mailto:tla@lucent.com Tel:908.582.7013 Fax:908.582.6729 Pager:800.759.8352 pin 1704572 1704572@skytel.com Lucent Technologies/ Voice Over IP Access Networks/ Applications Grp Rm 2B-121, 600 Mountain Av, Murray Hill, NJ 07974 http://its.lucent.com/~tla (Lucent internal) http://www.gti.net/tla
participants (1)
-
Terry L Anderson