Re: [Robustness] DDP Checkpointing
See my comments below... (also see Randy's answer, mine is more from an architectural point of view)
(Part 1: my mail server doesn't like long message for some reason :-)
-Qiaobing
Terry L Anderson wrote:
Randy Stewart -
You described a "checkpointing" mechanism for DDP during last week's robustness call but it is not in the current DDP draft. However, you have evidently thought about this some and stated (in private mail) that you had built such mechanism on top of DDP using the SEND_TO_ALL mechanism. How would you envision something like this being done to synch state for a pool of H.323 entities? Would this be a application level library that could be called from H.323 software at appropriate checkpoint times? Could you describe this checkpointing idea in some more detail?
We consider DDP covering only session-level fault tolerant data transfer, and therefore have left checkpointing out of DDP spec. We believe that the implementors should have full freedom to choose the most suitable checkpointing mechanism for their specific applications. Examples for checkpointing/replication mechnisms include:
1. Networked application servers on multiple hosts using one of the following replicating techniques: - IP multicast (cheap but unreliable) - DDP SEND_TO_ALL, a.k.a. groupcast (reliable, not very efficient in large group) - some Reliable Multicast protocol (still being worked on by IETF rmt WG, not sure whether they are considering "many-to-many" case though) - etc.
2. Multiple servers on the same host, using local shared memory IPC (Effective and efficient, but won't survive if the host crash, scaling problem too)
3. Servers on duplicated hardware cards on same platform (eg, cPCI box) with hardware-assist reflected memory (Very fast, not very scalable, a little expensive, won't survive platform crash.)
4. Shared network disk array (Cheap, but non-real time, disk array may become single-point-of-failure.)
Internally at Motorola, we have prototyped a hybrid of 1 and 2 and created a networked (virture) shared memory using IP multicast. It works great :-)
participants (1)
-
Qiaobing Xie