See my comments below... (also see Randy's answer, mine is more from an
architectural point of view)
(Part 1: my mail server doesn't like long message for some reason :-)
-Qiaobing
Terry L Anderson wrote:
>
> Randy Stewart -
>
> You described a "checkpointing" mechanism for DDP during last week's
> robustness call but it is not in the current DDP draft. However, you
> have evidently thought about this some and stated (in private mail) that
> you had built such mechanism on top of DDP using the SEND_TO_ALL
> mechanism. How would you envision something like this being done to
> synch state for a pool of H.323 entities? Would this be a application
> level library that could be called from H.323 software at appropriate
> checkpoint times? Could you describe this checkpointing idea in some
> more detail?
>
We consider DDP covering only session-level fault tolerant data
transfer, and therefore have left checkpointing out of DDP spec. We
believe that the implementors should have full freedom to choose the
most suitable checkpointing mechanism for their specific applications.
Examples for checkpointing/replication mechnisms include:
1. Networked application servers on multiple hosts using one of the
following replicating techniques:
- IP multicast (cheap but unreliable)
- DDP SEND_TO_ALL, a.k.a. groupcast (reliable, not very efficient in
large group)
- some Reliable Multicast protocol (still being worked on by IETF rmt
WG, not sure whether they are considering "many-to-many" case
though)
- etc.
2. Multiple servers on the same host, using local shared memory IPC
(Effective and efficient, but won't survive if the host crash,
scaling problem too)
3. Servers on duplicated hardware cards on same platform (eg, cPCI box)
with hardware-assist reflected memory
(Very fast, not very scalable, a little expensive, won't survive
platform crash.)
4. Shared network disk array
(Cheap, but non-real time, disk array may become
single-point-of-failure.)
Internally at Motorola, we have prototyped a hybrid of 1 and 2 and
created a networked (virture) shared memory using IP multicast. It
works great :-)