MIT 6.824 Lecture Notes
- Lecture 2: RPC and Threads Notes
- Notes From Google I/O 2012 - Go Concurrency Patterns
- Lecture 3: GFS
- Lecture 4: Primary-Backup Replication
- Lecture 5: Fault Tolerance: Raft (1)
Lecture 2: RPC and Threads Notes
Lecture 2 Notes for MIT 6.824 Distributed Systems course.
Race Conditions
-race flag to detect race conditions
Two ways of coordination:
- Channels
- Useful when no data sharing between threads
- Locks and Condition Vars
- Shared variables
When main returns and other threads don’t, garbage collector does not release the local variables of main since the threads have reference to them.
WaitGroup to keep track of the returned threads:
RPC semantics under failure
- at least once: client automatically retires until response
- at most once: (duplicates) server makes sure it doesn’t execute twice (go)
- exactyl once: hard!
Failures make PRC not equal to PC
Notes From Google I/O 2012 - Go Concurrency Patterns
Goroutines
It’s very cheap. It’s practical to have thousands, even hundreds of thousands of goroutines.
It’s not a thread.
There might be only one thread in a program with thousands of goroutines.
They are multiplexed dynamically onto threads as needed to keep all the goroutines running.
Channels
Sending and receiving are both blocking so synchronization. Buffering removes synchronization.
Patterns
- Generator (function that returns a channel): Channels as a handle on a service, we can have more instances of the service.
- Multiplexing: Use fan-in function to combine two or more channels.
Select
- All channels are evaluated.
- Selection blocks until one communication can proceed, which then does.
- If multiple can proceed, select chooses pseudo-randomly.
- A default clause, if present, executes immediately if no channel is ready.
Fan-in with select:
To not wait for too long:
Quit channel on main:
Two way communication on quit channel can be used when goroutine has some cleanup needed to be done, main waits until a message from the goroutine on quit channel before returning.
Q: How do we avoid discarding results from slow servers?
A: Replicate the servers. Send requests to multiple replicas, and use the first response.
Lecture 3: GFS
Storage is hard. Why?
- High performance -> Shard data across servers
- Many servers -> Constant faults, crash once a year 1000 machines -> 3 failures a day
- Fault tolerance -> Replication
- Replication -> Inconsistencies
- Strong consistency -> Lower performance
Ideal Consistency: Behave as a single machine. This is hard because of concurrency and failures.
GFS
Non-standard for that time:
- One master (Why have single point of failure?)
- Can have inconsistencies.
GFS has to be:
- Big: Large data set
- Fast: Automatic sharding
- Global: All apps see the same file system
- Fault tolerant: Automatic
Design
Master stores changes on logs then responds to clients. This way if master fails in between client does not see strange results.
Lecture 4: Primary-Backup Replication
Failures
Fail-Stop: Failure stops the computer, does not produce weird results
Replication won’t solve logical bugs, configuration errors, malicious.
Can be handled: earthquake kind of disasters. If primary and backup are on different places and one is unaffected replication could help.
Challenges
- Has the primary really failed? Could be a network partition. Want to avoid split-brain with two primaries.
- Keeping primary-backup in sync, dealing with non-determinism
- Fail-over. Failure in the middle of the operation, what to do? Multiple backups, which one to choose?
Two Approaches
- State Transfer: Transfer state changes. If operation generates a lot of state this can be expensive.
- Replicate State Machine (RSM): Transfer operations
Level of Operations to Replicate
- Application level: Apps should be modified.
- Machine level: Operations are ordinary computer operations. Application, OS do not have to be modified. Transparent. Can be done with hardware replication but also with virtual machines.
VM FT
Fault-Tolerant Virtual Machines (2010)
Any events, external interrupts to the Vm is first captured by the hypervisor. On the paper, hypervisor not only sends it to the virtual machine but also to a logging channel to a backup.
Lecture 5: Fault Tolerance: Raft (1)
2f+1 servers must be running for f+1 to be majority.
Why logs: Order, retransmission, persistence, space for tentative operations.
Consensus algorithms allow a collection of machines to work as a coherent group that can survive the failures of some of its members.
Replicated State Machines: Replicated state machines are typically implemented using a replicated log. Each server stores a log containing a series of commands, which its state machine executes in order. Keeping the replicated log consistent is the job of the consensus algorithm.
Raft
Main goal of Raft: understandability. Paxos was hard to understand.
3 states: leader, follower, candidate (used to elect a new leader)
In normal operation there is exactly one leader and all of the other servers are followers.
Raft divides time into random length terms. Each term starts with an election.
If a candidate wins the election, then it serves as leader for the rest of the term. In some situations an election will result in a split vote. In this case the term will end with no leader. Then new election on new term.
Communication by RPC:
- RequestVote: By candidate on election
- AppendEntries: By leaders to replicate log entries and to provide a form of heartbeat
If a follower receives no communication over a period of time called the election timeout, then it assumes there is no viable leader and begins an election to choose a new leader.
Nice visualization: Secret Lives of Data