hi all,
This is following our discussion in the last teleconference. In the last discussion we wanted answers to the following questions:
a) do we need "call-state synchronisation" between the two legs of a call when an intermediate GK fails? Is it ok to assume that things will sort themselves out without any significant loss?
b) Are there any cases where absence of "call-synchronisation" procedures can lead to hung resources or call-release if any) of a stable call?
c) Are there any other issues(e.g: degraded service) that we need to take care of in the absence of "call-synchronisation"?
d) if the answer to b)is yes, then do we need an ACK at the H.323 layer if H.323 layer runs over SCTP/DDP?
After the teleconference, we went back to do some study and discussions with Randy and Qiaobing and here is a summary of that:
a) While SCTP/DDP provides fault tolerance at the transport layer, it cannot handle the case where a GK fails after the message is ACKed at the SCTP layer of the GK. So in a case like:
(CRASH) RELCOMPLETE EP2 <-------------- GK <------------ EP1 (SCTP/DDP) (SCTP/DDP) (SCTP/DDP) -----SCTP-ACK---> (GK NODE FAILS)
when EP1 sends a RELCOMPLETE message to the GK, the SCTP/DDP sends an SCTP-ACK to the EP1 and if the GK node fails after this step, then the RELCOMPLETE message is lost. SCTP/DDP layer cannot detect such failures and therefore it is upto the H.323 protocol layer to recover from it (if required).
So, we agreed that messages at the GK can get lost even with SCTP/DDP. Please note that it implies that in a normal GK implementation, the H.323 layer will probably use a "queue" to exchange messages with the SCTP/DDP layer. The SCTP/DDP will put all those messages in this "queue" for which it has sent an SCTP-ACK. Therefore when the H.323 layer fails, we lose all the messages that were present in the "queue" and these messages may belong to multiple calls.
In other words, failure of the H.323 layer is not trivial, since it doesnot mean loss of just one message belonging to that "one particular call" that was being processed at the time of the failure. It means the loss of all those messages that were present in the "queue" at the time of the failure which may belong to multiple calls.
Having said this, we wanted to identify the impact of the lost messages at the GK and at the endpoints. I am enclosing a table of some of the possible messages that can be lost and what they might potentially translate to. please note that this tabel is not exhaustive . As of now, the current H.323 specs does not talk about the action that should be taken if for a particular command/indication, the terminal doesnot respond as desired. I guess the assumption is that the message is delivered reliably. <<CallStateSync.doc>>
We would like to discuss the issues listed in the tables with the group to get an idea of how current implementations behave if these messages are lost. Depending on the general consensus, we can conclude whether or not an ACK should be introduced.
Regards Archana
Archana, et al,
In addition to these messages, there are other messages where, if one was lost, there would be issues, including: communication mode command terminal capability set (including TCS=0) end session command
Certainly, there are others and if those commands are lost a queue, as you suggest, the call may become very unstable.
In addition to the H.245 commands above, any H.225.0 signaling may be an issue, too. I believe people have stated that the H.450.x series protocols have robustness designed within the protocol, but I have not looked for issues. For example, if a particular ReturnResult message is lost, what state would things be in?
Other services are built upon H.225.0, as well, including Annex K and Annex L. In addition, we are now relying on the H.225.0 layer to tunnel other protocols (ISUP and QSIG) as described in Annexes M.1 and M.2. There is really no way to define what the state of the call might be in if any of those messages are lost.
Paul
----- Original Message ----- From: "Archana Nehru" archie@TRILLIUM.COM To: ITU-SG16@mailbag.cps.intel.com Sent: Wednesday, May 03, 2000 8:41 PM Subject: H.323 Robustness
hi all,
This is following our discussion in the last teleconference. In the last discussion we wanted answers to the following questions:
a) do we need "call-state synchronisation" between the two legs of a call when an intermediate GK fails? Is it ok to assume that things will sort themselves out without any significant loss?
b) Are there any cases where absence of "call-synchronisation" procedures can lead to hung resources or call-release if any) of a stable call?
c) Are there any other issues(e.g: degraded service) that we need to take care of in the absence of "call-synchronisation"?
d) if the answer to b)is yes, then do we need an ACK at the H.323 layer if H.323 layer runs over SCTP/DDP?
After the teleconference, we went back to do some study and discussions
with
Randy and Qiaobing and here is a summary of that:
a) While SCTP/DDP provides fault tolerance at the transport layer, it cannot handle the case where a GK fails after the message is ACKed at the SCTP layer of the GK. So in a case like:
(CRASH) RELCOMPLETE EP2 <-------------- GK <------------ EP1 (SCTP/DDP) (SCTP/DDP) (SCTP/DDP) -----SCTP-ACK---> (GK NODE FAILS)
when EP1 sends a RELCOMPLETE message to the GK, the SCTP/DDP sends an SCTP-ACK to the EP1 and if the GK node fails after this step, then the RELCOMPLETE message is lost. SCTP/DDP layer cannot detect such failures and therefore it is upto the H.323 protocol layer to recover from it (if required).
So, we agreed that messages at the GK can get lost even with SCTP/DDP. Please note that it implies that in a normal GK implementation, the H.323 layer will probably use a "queue" to exchange messages with the SCTP/DDP layer. The SCTP/DDP will put all those messages in this "queue" for which
it
has sent an SCTP-ACK. Therefore when the H.323 layer fails, we lose all
the
messages that were present in the "queue" and these messages may belong to multiple calls.
In other words, failure of the H.323 layer is not trivial, since it
doesnot
mean loss of just one message belonging to that "one particular call" that was being processed at the time of the failure. It means the loss of all those messages that were present in the "queue" at the time of the failure which may belong to multiple calls.
Having said this, we wanted to identify the impact of the lost messages at the GK and at the endpoints. I am enclosing a table of some of the
possible
messages that can be lost and what they might potentially translate to. please note that this tabel is not exhaustive . As of now, the current
H.323
specs does not talk about the action that should be taken if for a particular command/indication, the terminal doesnot respond as desired. I guess the assumption is that the message is delivered reliably. <<CallStateSync.doc>>
We would like to discuss the issues listed in the tables with the group to get an idea of how current implementations behave if these messages are lost. Depending on the general consensus, we can conclude whether or not
an
ACK should be introduced.
Regards Archana
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For help on this mail list, send "HELP ITU-SG16" in a message to listserv@mailbag.intel.com
This discussion is only true if we do NOT use DDP/bulletin board for message integrity. I know that most have rejected this, but let me explore the use of BB for this.
Here is what I have in mind. We use one BB for call state and a second BB for message passing state. Looking only at the message state BB here is the call flow (this is harder to show in ASCII):
GW1---msg--->GK1 GK1----------------msg------>GK2 GK1--msg cpy-->BB GK1<--DDP ack--BB GW1<-DDP ack-GK1 GK1<-----DDP ack-------------GW2 GK1--msg clear->BB GK1<--DDP ack--BB The extra msgs are 1. copy of orig msg to BB with Ack 2. msg clear message to BB with Ack
The alternative with end-to-end ack is: GW1---msg--->GK1 GW1<--ack----GK1 GK1-----msg--->GK2 GK1<----ack----GK2 GK2---msg-->etc GK2<--EoEack-- GK2--EoEack-ack-> GK1<---EoEack--GK2 GK1-EoEack-ack->GK2 GW1<-EoEack--GK1 GW1-EoEack-ack>GK1
I am showing both the DDP ack (to be consistent with first example) and the End-to-End ack. The number of messages (not counting those from GW2 on which are shown in the second example only to illustrate when the EoEack comes) is exactly the same (8 msg among GW1,GK1,GK2). The only difference is that the msg copy from GK1 to BB is larger than the extra acks, but GW (in most cases the more limited resource) must treat more messages in the second case.
Archana Nehru wrote:
hi all,
This is following our discussion in the last teleconference. In the last discussion we wanted answers to the following questions:
a) do we need "call-state synchronisation" between the two legs of a call when an intermediate GK fails? Is it ok to assume that things will sort themselves out without any significant loss?
b) Are there any cases where absence of "call-synchronisation" procedures can lead to hung resources or call-release if any) of a stable call?
c) Are there any other issues(e.g: degraded service) that we need to take care of in the absence of "call-synchronisation"?
d) if the answer to b)is yes, then do we need an ACK at the H.323 layer if H.323 layer runs over SCTP/DDP?
After the teleconference, we went back to do some study and discussions with Randy and Qiaobing and here is a summary of that:
a) While SCTP/DDP provides fault tolerance at the transport layer, it cannot handle the case where a GK fails after the message is ACKed at the SCTP layer of the GK. So in a case like:
(CRASH) RELCOMPLETE EP2 <-------------- GK <------------ EP1 (SCTP/DDP) (SCTP/DDP) (SCTP/DDP) -----SCTP-ACK---> (GK NODE FAILS)
when EP1 sends a RELCOMPLETE message to the GK, the SCTP/DDP sends an SCTP-ACK to the EP1 and if the GK node fails after this step, then the RELCOMPLETE message is lost. SCTP/DDP layer cannot detect such failures and therefore it is upto the H.323 protocol layer to recover from it (if required).
So, we agreed that messages at the GK can get lost even with SCTP/DDP. Please note that it implies that in a normal GK implementation, the H.323 layer will probably use a "queue" to exchange messages with the SCTP/DDP layer. The SCTP/DDP will put all those messages in this "queue" for which it has sent an SCTP-ACK. Therefore when the H.323 layer fails, we lose all the messages that were present in the "queue" and these messages may belong to multiple calls.
In other words, failure of the H.323 layer is not trivial, since it doesnot mean loss of just one message belonging to that "one particular call" that was being processed at the time of the failure. It means the loss of all those messages that were present in the "queue" at the time of the failure which may belong to multiple calls.
Having said this, we wanted to identify the impact of the lost messages at the GK and at the endpoints. I am enclosing a table of some of the possible messages that can be lost and what they might potentially translate to. please note that this tabel is not exhaustive . As of now, the current H.323 specs does not talk about the action that should be taken if for a particular command/indication, the terminal doesnot respond as desired. I guess the assumption is that the message is delivered reliably. <<CallStateSync.doc>>
We would like to discuss the issues listed in the tables with the group to get an idea of how current implementations behave if these messages are lost. Depending on the general consensus, we can conclude whether or not an ACK should be introduced.
Regards Archana
Name: CallStateSync.doc
CallStateSync.doc Type: Microsoft Word Document (application/msword) Encoding: base64
-- ------------------------------------------------------------ Terry L Anderson mailto:tla@lucent.com Tel:908.582.7013 Fax:908.582.6729 Pager:800.759.8352 pin 1704572 1704572@skytel.com Lucent Technologies/ Voice Over IP Access Networks/ Applications Grp Rm 2B-121, 600 Mountain Av, Murray Hill, NJ 07974 http://its.lucent.com/~tla (Lucent internal) http://www.gti.net/tla
participants (3)
-
Archana Nehru
-
Paul E. Jones
-
Terry L Anderson