[Robustness] DDP Checkpointing
Randy Stewart -
You described a "checkpointing" mechanism for DDP during last week's robustness call but it is not in the current DDP draft. However, you have evidently thought about this some and stated (in private mail) that you had built such mechanism on top of DDP using the SEND_TO_ALL mechanism. How would you envision something like this being done to synch state for a pool of H.323 entities? Would this be a application level library that could be called from H.323 software at appropriate checkpoint times? Could you describe this checkpointing idea in some more detail?
It would seem preferable to either break up state information into a number of subsets (to avoid resending large amounts each time) or to only send differences. Since DDP gives reliable delivery to the other members of the pool, perhaps only changes need be sent. Did your trials solve this problem. Perhaps each call would be a separate subset.
One issue is how to solve reliable end-to-end delivery. SCTP would guarantee delivery to other end of connection but the H.225 and H.245 protocols may pass through intermediate nodes (e.g., routing GKs). If a node fails after acknowledging the message but before sending it on, end-to-end delivery fails and no live element knows. Either we need far end acknowledgement or backup elements must accept delivery reponsibility once acknowledgement has been sent by its later failed peer. DDP would guarantee delivery but the recipient would have to checkpoint this message before acknowledgement to guarantee that a peer taking over whould know that there was an outstanding message that it must deliver and receive acknowledgement for. This mechanism would require checkpoint after receipt of but before sending acknowledgement of a message and again after receiving acknowledgement from its next neighbor. Does this seem reasonable? Unless the amount of data sent in such checkpointing is kept very small this would seem to add too much network activity to be acceptible.
-- ------------------------------------------------------------ Terry L Anderson mailto:tla@lucent.com Tel:908.582.7013 Fax:908.582.6729 Pager:800.759.8352 pin 1704572 1704572@skytel.com Lucent Technologies/ Voice Over IP Access Networks/ Applications Grp Rm 2B-121, 600 Mountain Av, Murray Hill, NJ 07974 http://its.lucent.com/~tla (Lucent internal) http://www.gti.net/tla
Terry:
I am also copying the sigtran list, we should be releasing a new version of the draft with NO technical changes shortly... this will have just a new "Motiviation" section to help folks understand why we built DDP and where it fits... Hopefully this will bring about discussion of this on the sigtran list.
Sigtraners: If you are interested in DDP a bit, read on, if not you can skip this email... R
I have contacted the Application Area AD's (per the suggestions at the last IETF) and they have as yet not responded...
Let me detail some of the state idea's we have done in our testbed. Sorry all that I missed the conf call today... alergy season had me under the weather :-0
Now has you state DDP itself does NOT provide any state sharing mechanism per se. There is a mechanism called SEND_TO_ALL, which sends to every one of a given named group (except myself if I am of that same name, which you would be in the case of a state sharing situation). This will probably need to be extended with one more additional type, SEND_TO_BACKUP's. This would be in conjunction with the new load balancing algrithm that we need to add to have failures go to guy's designated has backup.
In itself these are very primitive mechanisms for state sharing and don't scale to well. However what we did is create a set of "distributed shared memory".
Our mechanism is built on two foundations: o The application wishing to share state includes a library, in our case it is a object since we are coding in C++ but it could just as well be a c library.
o A deamon that runs on every machine that wishes to share state. Note there is only ONE deamon per physical machine.
Now all the deamons register under a specific name, for my example I will call the shared memory "terry". So all of our "terry" deamon register and each allocates a bunch of shared memory on there local O/S.
When a application process starts up on the local machine it initializes its object (or library) by giving it the name of the shared memory deamon i.e. "terry". This allows the library/object to find the shared memory and communicate with the deamon (via the object, the application is never directly aware of communications).
To the local application the object/library appears to be much like a hash table. It has the following primitives:
allocate() push() lookup() remove()
allocate has you might imagine allocates a piece of shared memory from the shared memory pool via a specific key. The key can be anything the app wants..just like a hash table... lets say we use a call-reference-number. Now the application now, as a result of the allocate, gets a pointer to a piece of local memory. It puts some state in this block and when it decides to checkpoint it does a push().. The push is rather flexible allowing it to push the whole thing OR push just a specific set of bytes of that piece of memory.
When the push happens notifications go out between the library/object and deamon. The deamon will then take and make sure that ALL other deamons gets a copy of this memory.
Now in theory (in our model at least) the call stays attached to the guy that allocated the memory, unless he fails. Has such we stress to our developers that they probably should ONLY use the minimum memory to reconstruct state in this shared memory block and NOT use it has there call control block, they could/can violate this principle, but the idea here is to have a recovery mechanism not a place to put your call state :-)
If my local sever fails, the DDP is optioned by its clients to "fail-over" to some other call server. When this occurs our friend the peer call server sees a call-reference-number that it knows nothing about. So what does it do? it does a lookup() call on that key. If it finds nothing.. we just lost the call.. OR hopefully we find the state block, reconstruct the call state and thus the current call object and continue the call...
Now notice there is NO synchronization between the shared memory deamons in my example. This case was not deemed worth the overhead since we use the "lazy binding" principle that you stay with the first call server that serves you on your call unless he fails.. Other models of servers are possible that would do 2-phase commit type approaches to the shared memory if it was required...
So this is how we share state in DDP. DDP only provides the framework for building this and the tools for the server/client to transparently fail over with minimal work...
Now let me see if I can answer anything I missed in your email....
Terry L Anderson wrote:
Randy Stewart -
You described a "checkpointing" mechanism for DDP during last week's robustness call but it is not in the current DDP draft. However, you have evidently thought about this some and stated (in private mail) that you had built such mechanism on top of DDP using the SEND_TO_ALL mechanism. How would you envision something like this being done to synch state for a pool of H.323 entities? Would this be a application level library that could be called from H.323 software at appropriate checkpoint times? Could you describe this checkpointing idea in some more detail?
I think the above covered this.. let me know if I am unclear.. still have a sinus headache so I may be a bit vague...
It would seem preferable to either break up state information into a number of subsets (to avoid resending large amounts each time) or to only send differences. Since DDP gives reliable delivery to the other members of the pool, perhaps only changes need be sent. Did your trials solve this problem. Perhaps each call would be a separate subset.
Well in effect we only do the push() above at the point we consider it a stable call. We did discuss doing calls in setup stages but decided that we did not need to do this.. it is possible since our push() mechanism lets us push selected pieces but we currently only do it for the whole thing when the calls are stable...
One issue is how to solve reliable end-to-end delivery. SCTP would guarantee delivery to other end of connection but the H.225 and H.245 protocols may pass through intermediate nodes (e.g., routing GKs). If a node fails after acknowledging the message but before sending it on, end-to-end delivery fails and no live element knows. Either we need far end acknowledgement or backup elements must accept delivery reponsibility once acknowledgement has been sent by its later failed peer. DDP would guarantee delivery but the recipient would have to checkpoint this message before acknowledgement to guarantee that a peer taking over whould know that there was an outstanding message that it must deliver and receive acknowledgement for. This mechanism would require checkpoint after receipt of but before sending acknowledgement of a message and again after receiving acknowledgement from its next neighbor. Does this seem reasonable? Unless the amount of data sent in such checkpointing is kept very small this would seem to add too much network activity to be acceptible.
I think this is a reasonable method, depending on how much data is in the checkpoint. The key is to keep the checkpointed data small and also the range of the shared memory distribution..
--
Terry L Anderson mailto:tla@lucent.com Tel:908.582.7013 Fax:908.582.6729 Pager:800.759.8352 pin 1704572 1704572@skytel.com Lucent Technologies/ Voice Over IP Access Networks/ Applications Grp Rm 2B-121, 600 Mountain Av, Murray Hill, NJ 07974 http://its.lucent.com/~tla (Lucent internal) http://www.gti.net/tla
If I have missed anything let me know...
-- Randall R. Stewart Member Technical Staff Network Architecture and Technology (NAT) 847-632-7438 fax:847-632-6733
participants (2)
-
Randall Stewart
-
Terry L Anderson