[Robustness] Some Issues

12 Oct 2000

      I got no one on the call today - perhaps because of the late notice.
Let's try addressing some issues by email.

Sec 8.4 discusses H.245 Connection Reestablishment.  I think there are a
few small issues.

1.  While multiple calls can be signaled over a single CallSig
connection, I do not believe that this is permitted on H.245 Call
Control Signaling channels.  So we do not have to be concerned with
multiple calls when H.245 fails, right?

2. Assume CallSig channel failed and we have a separate H.245 channel
which also failed (tunneled H.245 does NOT have this issue).  We say,
send "endSessionCommand" on the H.245 channel and then drop the
connection.  Then reestablish CallSig, send Facility with our h245addr.
The other end reestablishes H.245 channel to us.

issue 2a. If the endSessionCommand makes it through (we detected failure
but it wasn't completely dead?), won't the other end believe that the
call is over, hangup, send us ReleaseComplete, etc. so we won't be able
to recover?  Before these procedures, loss of H.245 ended the call.  Do
we need a flag in endSessionCommand to indicate that we are "replacing"
the channel rather than ending the call?  Or can we omit the
endSessionCommand - just drop the connection that we believe is dead.
The Facility command will tell the other end to re-establish, dropping
its end if necessary.

issue 2b. How do the two entities with the new h.245 channel know that
this is a replacement and NOT do to initialization procedures
(master/slave, capabilities exchange)?  The end sending Facility knows
and could remember this when the channel is established TO it from the
other end.  How does the end receiving Facility, distingish this from
case where a new channel is needed (transition from fastStart or
tunnelling)?  By the fact that it has or had a distinct h.245 channel
for that call already?  Is this good enough, or do we need a flag in
Facility or a Reason to indicate "RE-establishment"?

3. Assume that CallSig has NOT failed but separate H.245 has failed.
This case is not clearly covered by the current wording in the
paragraph, but I think the issues are the same.  We either send
endSessionCommand before dropping the "dead" connection or not as we
decided in issue 2a.  and then use Facility over the unfailed CallSig
channel to re-establish.  I will change the wording to clearly cover
this case as well.

Section 6.4.1.  We currently specify that if a CallSig connection fails
we re-establish to the BACKUP transport address.  There is some chance
that the failure does NOT prevent a new connection to the original
entity, e.g., failure of a socket listener process or a temporary
failure.  Recovering the call is easier in the original entity than in
the backup since the call state may still be locally available.  It
would be nice to have a way to try the original entity before trying the
backup.  But we do not want to wait for the timeout of a TCP connection
attempt.  Should we consider or at least permit (make optional) a
mechanism to probe the original entity with something like ping to test
for network connectivity and operating system being alive first and if
successful attempt re-establishing TCP to original entity BEFORE trying
backup?

Section 6.4.3 closing old tcp connections.
My notes imply that someone on a previous call may have had a problem
with this section, but I have lost who or what the problem was.  If
someone has a problem with it, repeat the issue.

Section 6.3 Editor's note at the end.
This describes a case that we discussed in a previous call.  The goal
was to reduce the number of KeepAlive messages for multiple calls
between the same two entities.  This is simple for the case where the
calls are signaled on a single multiplexed TCP connection (and we
discuss that).  The issue is to use a single KeepAlive even when there
is more than one TCP connection used.  The problem we have is to
identify all the calls that are truely between the same two entities.
Analysis of special cases lead us to conclude that this could NOT be
done without adding an additional globallyUniqueId to label the calls
that were "clustered" or related to one KeepAlive exchange.  The issue
was whether a solution requiring an additional id field is worth the
benefit.

Another solution that I do not believe we considered is for the
KeepAlive message to add a field that carries some id of the other
channels it applies to, but it is not clear in this case either, how to
designate the channels.  TransportAddress is not sufficiently unique,
nor is the IPAddr of the establishing end.  Since multiple calls may be
on the channel, one cannot use callId or CRV.  So we still have a
problem requiring some new globallyUniqueId (note that is must be
globally unique, since the connection may well be between two zones).

I'd like to hear from folks that think this is worth solving even if it
requires maintenance of a new Id or any new solution that might avoid
it.

--
------------------------------------------------------------
Terry L Anderson              mailto:tla@lucent.com
Tel:908.582.7013   Fax:908.582.6729
Pager:800.759.8352 pin 1704572   1704572@skytel.com
Lucent Technologies/ Voice Over IP Access Networks/ Applications Grp
Rm 2B-121, 600 Mountain Av, Murray Hill, NJ 07974
http://its.lucent.com/~tla (Lucent internal) http://www.gti.net/tla

Terry L Anderson

tags

participants (1)