Re: [Robustness] DDP Checkpointing

12 Apr 2000

      Terry:

I am also copying the sigtran list, we should be releasing
a new version of the draft with NO technical changes shortly...
this will have just a new "Motiviation" section to help
folks understand why we built DDP and where it fits... Hopefully
this will bring about discussion of this on the sigtran list.

Sigtraners: If you are interested in DDP a bit, read on, if not
you can skip this email... R

I have contacted the Application Area AD's (per the suggestions
at the last IETF) and they have as yet not responded...

Let me detail some of the state idea's we have done
in our testbed. Sorry all that I missed the conf
call today... alergy season had me under the weather :-0

Now has you state DDP itself does NOT provide any state
sharing mechanism per se. There is a mechanism called
SEND_TO_ALL, which sends to every one of a given named
group (except myself if I am of that same name, which you
would be in the case of a state sharing situation). This
will probably need to be extended with one more additional
type, SEND_TO_BACKUP's. This would be in conjunction with
the new load balancing algrithm that we need to add to
have failures go to guy's designated has backup.

In itself these are very primitive mechanisms for state sharing
and don't scale to well. However what we did is create a
set of "distributed shared memory".

Our mechanism is built on two foundations:
 o The application wishing to share state includes a library, in
   our case it is a object since we are coding in C++ but it could
   just as well be a c library.

 o A deamon that runs on every machine that wishes to share state.
   Note there is only ONE deamon per physical machine.

Now all the deamons register under a specific name, for my
example I will call the shared memory "terry". So all of our
"terry" deamon register and each allocates a bunch of shared
memory on there local O/S.

When a application process starts up on the local machine it
initializes its object (or library) by giving it the name of
the shared memory deamon i.e. "terry". This allows the library/object
to find the shared memory and communicate with the deamon (via
the object, the application is never directly aware of communications).

To the local application the object/library appears to be much
like a hash table. It has the following primitives:

allocate()
push()
lookup()
remove()

allocate has you might imagine allocates a piece of shared memory
from the shared memory pool via a specific key. The key can be
anything the app wants..just like a hash table... lets say
we use a call-reference-number. Now the application now, as
a result of the allocate, gets a pointer to a piece of
local memory. It puts some state in this block and when
it decides to checkpoint it does a push().. The push is
rather flexible allowing it to push the whole thing OR
push just a specific set of bytes of that piece of memory.

When the push happens notifications go out between the library/object
and deamon. The deamon will then take and make sure that ALL other
deamons gets a copy of this memory.

Now in theory (in our model at least) the call stays attached to
the guy that allocated the memory, unless he fails. Has such we
stress to our developers that they probably should ONLY use the
minimum memory to reconstruct state in this shared memory block
and NOT use it has there call control block, they could/can
violate this principle, but the idea here is to have a recovery
mechanism not a place to put your call state :-)

If my local sever fails, the DDP is optioned by its clients to
"fail-over" to some other call server. When this occurs our
friend the peer call server sees a call-reference-number that
it knows nothing about. So what does it do? it does a lookup()
call on that key. If it finds nothing.. we just lost the call..
OR hopefully we find the state block, reconstruct the call state
and thus the current call object and continue the call...

Now notice there is NO synchronization between the shared memory
deamons in my example. This case was not deemed worth the overhead
since we use the "lazy binding" principle that you stay with the
first call server that serves you on your call unless he fails.. Other
models of servers are possible that would do 2-phase commit type
approaches to the shared memory if it was required...

So this is how we share state in DDP. DDP only provides the framework
for building this and the tools for the server/client to transparently
fail over with minimal work...

Now let me see if I can answer anything I missed in your email....

Terry L Anderson wrote:
...
Randy Stewart -
You described a "checkpointing" mechanism for DDP during last week's
robustness call but it is not in the current DDP draft.  However, you
have evidently thought about this some and stated (in private mail) that
you had built such mechanism on top of DDP using the SEND_TO_ALL
mechanism.  How would you envision something like this being done to
synch state for a pool of H.323 entities?  Would this be a application
level library that could be called from H.323 software at appropriate
checkpoint times?  Could you describe this checkpointing idea in some
more detail?
I think the above covered this.. let me know if I am unclear.. still
have a sinus headache so I may be a bit vague...
...
It would seem preferable to either break up state information into a
number of subsets (to avoid resending large amounts each time) or to
only send differences.  Since DDP gives reliable delivery to the other
members of the pool, perhaps only changes need be sent.  Did your trials
solve this problem.  Perhaps each call would be a separate subset.
Well in effect we only do the push() above at the point we consider
it a stable call. We did discuss doing calls in setup stages but
decided that we did not need to do this.. it is possible since
our push() mechanism lets us push selected pieces but we currently
only do it for the whole thing when the calls are stable...
...
One issue is how to solve reliable end-to-end delivery.  SCTP would
guarantee delivery to other end of connection but  the H.225 and H.245
protocols may pass through intermediate nodes (e.g., routing GKs).  If a
node fails after acknowledging the message but before sending it on,
end-to-end delivery fails and no live element knows.  Either we need far
end acknowledgement or backup elements must accept delivery
reponsibility once acknowledgement has been sent by its later failed
peer.  DDP would guarantee delivery but the recipient would have to
checkpoint this message before acknowledgement to guarantee that a peer
taking over whould know that there was an outstanding message that it
must deliver and receive acknowledgement for.  This mechanism would
require checkpoint after receipt of but before sending acknowledgement
of a message and again after receiving acknowledgement from its next
neighbor.  Does this seem reasonable?  Unless the amount of data sent in
such checkpointing is kept very small this would seem to add too much
network activity to be acceptible.
I think this is a reasonable method, depending on how much data is
in the checkpoint. The key is to keep the checkpointed data small
and also the range of the shared memory distribution..
...
--
------------------------------------------------------------
Terry L Anderson              mailto:tla@lucent.com
Tel:908.582.7013   Fax:908.582.6729
Pager:800.759.8352 pin 1704572   1704572@skytel.com
Lucent Technologies/ Voice Over IP Access Networks/ Applications Grp
Rm 2B-121, 600 Mountain Av, Murray Hill, NJ 07974
http://its.lucent.com/~tla (Lucent internal) http://www.gti.net/tla
If I have missed anything let me know...

--
Randall R. Stewart
Member Technical Staff
Network Architecture and Technology (NAT)
847-632-7438 fax:847-632-6733

Re: [Robustness] DDP Checkpointing

Randall Stewart