Distributed And Cloud-Based Storage Systems
https://sedna.cs.umd.edu/818
TTh 12:30-1:45, CSI 3120


The guiding philosophy of this course is that the best way to learn about real systems is to build one. We will gain an in-depth understanding of the issues involved in designing and deploying large-scale distributed file systems. In the course of this investigation we will be tackling a variety of topics, such as peer-to-peer systems, remote procedure calls, multi-threading, consensus protocols, cloud systems, layered systems (supporting high-level consistency guarantees on top of cloud services), and security as it relates to such systems.

Announcements:

Professor

Pete Keleher <keleher@cs.umd.edu> (include "818" in all correspondance)
Office hours: By appt.

Information

The class will consist of lectures by the instructor, student project presentations, a final, and a series of probably four programming projects, all in the language Go (fear not if you don't know anything about go, we'll all be learning together). The end goal is to have built a full-scale reliable, highly-available, and secure distributed file system, using both local disks and cloud services as backing stores. My lectures will be split between those describing the tools we will use to build our file systems, and lectures based on recent research in the literature (such as those at FAST, OSDI, NSDI, and SOSP.

Examples of technologies we may use include FUSE (and MacFUSE), key value stores like Bolt or gkvlite or diskv or leveldb-go, the Amazon Simple Storage Service (and go binding), Google's Protocol Buffers or json (from Go), Google's Go language, PAXOS, SQLite, and Snappy.

      Note: this paper list will change by the end of the second week.

Tuesday Thursday
Aug 31
Intro
Reading: A Tour of Go, and Effective Go

Solve the following puzzle, copy your solution into a fresh playground, and send me the "Share" url before class Thursday.

(slides)

Sep 2
Intro/Go

"Immutability Changes Everything"

(slides)

Sep 7
Intro/Go
"The Design and Implementation of a Log-Structured File System"
     and
"A Low-bandwidth Network File System"

 

(slides)

Sep 9
Global system event orderings.
(notes)

Project 1: Learning Go, In-Memory File System due Sunday night.

Sep 14
"MapReduce: simplified data processing on large clusters."
     and
"Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing."

 

(notes)

Sep 16
"The Google File System"
     and
"GFS: Evolution on Fast-forward"

 

(notes)

Sep 21
Versioning
"Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System"
     and
"Deciding when to forget in the Elephant file system"

 

(notes)

Sep 23"File systems unfit as distributed storage backends: lessons from 10 years of Ceph evolution."
     and
"Lineage stash: fault tolerance off the critical path"

Project 2: Serialization, Persistence, and Immutability. due Sunday night.

Sep 28
"Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications" - kaitlyn
     and
"Session types for Rust" - aaron
Sep 30
"OceanStore: An Architecture for Global-Scale Persistent Storage"
     and
"Pond: the OceanStore Prototype"

(slides)
CACM entropy paper

Oct 5
"Scalable Causal Consistency for Wide-Area Storage with COPS"
     and
 "High performance I/O for large scale deep learning" - Dhanvee
Oct 7
Databases. No reading

Project 3: Log Synchronization.

Oct 12
More Databases. No reading

 

Correctness Anomalies Under Serializable Isolation (no blog)

Oct 14
"Dynamo: Amazon's highly available key-value store" - akilesh
     and
"Monarch: Google's planet-scale in-memory time series database" - elliot
Oct 19
Versioning
"Read-Write Quorum Systems Made Practical"
     and
"Convergent Causal Consistency for Social Media Posts" - kelsey
Oct 21
"Totally-Ordered Prefix Parallel Snapshot Isolation" - anubhav
     and
"Towards the Synthesis of Coherence/Replication Protocols from Consistency Models via Real-Time Orderings"
Oct 26
Configuration
"Advanced Domain-Driven Design for Consistency in Distributed Data-Intensive Systems"
     and
"Salt: Combining ACID and BASE in a Distributed Database"
Oct 28
Consensus
"Paxos Made Simple"
     and
"Paxos Made Live: an Engineering Perspective"

Project 4: Distributed Consensus.

Nov 2
Consensus
"In search of an understandable consensus algorithm"
     and
"Egalitarian paxos"
Nov 4
"Fast and secure global payments with Stellar" - william
     and
"CALM: when distributed consistency is easy"
Nov 9
MDCC
"Consensus Across Continents"
     and
"FAWN: A fast array of wimpy nodes" - claude
Nov 11
Ramcloud
"The Case for RAMCloud"
     and
"Fast crash recovery in RAMCloud" - nikhil
     and
(journal paper)- nikhil
Nov 16
Spanner
"Spanner: Google's Globally-Distributed Database" - jerry
     and
Background (no blog): "Living Without Atomic Clocks" (CockRoachDB)
Nov 18
CRDTs
"Conflict-free replicated data types"

Project 5: High-level Abstractions on Shared Logs.

Nov 23
Building on a shared log
"The Fuzzylog: a Partially Ordered Shared Log"
     and
"Tango: Distributed data structures over a shared log"
Nov 25Thanksgiving
Nov 30
Abadi
Sections 1-3 from "Calvin: fast distributed transactions for partitioned database systems"
     and
Sections 3.0-3.3 from "SLOG: Serializable, Low-latency, Geo-replicated Transactions"
Dec 2
Fault Tolerance and Security
"Practical Byzantine Fault Tolerance"
     and
"Separating agreement from execution for byzantine fault tolerant services"
Dec 7
Hash Chains
"Bitcoin: A Peer-to-Peer Electronic Cash System"
     and
"Architecture of the hyperledger blockchain fabric"
Dec 9


Project 6: Distributed Transactions over Shared Logs.

Late Policies

All projects will have a due date, and a late due date two days later.
  • Do each project by yourself. Sadly, we can and do detect and fail those that do not abide by this policy each semester. You may ask, and answer, general questions on Piazza.
  • Your grade loses 20% of the max score if the project is turned in after the due date, but by the late due date. Anything after the late due date gives you a zero.

Attendance and general grading policies

Students are responsible for all material covered, and all announcements, deadlines, policies, etc., discussed in lecture and discussion section, regardless of whether they were in class to hear the information or not. It’s understood that students may occasionally have to miss class for various reasons, but email and office hours are not intended as a replacement for class attendance. Consequently, only students who typically and regularly attend class will receive assistance during office hours.

Coursework will count toward the final grade according to the following percentages:

  1. Projects: 65%
    • There will six projects, the first worth 10%, the rest 11% each.
    • Must get at least half credit on each project to pass the course.
  2. Blog entries: 10%
    • You are required to upload a blog entry before each class except the first. More details in class.
  3. Paper presentation / class participation: 5%
  4. Final exam: 20%

Academic integrity

The Campus Senate has adopted a policy asking students to include the following statement on each examination or assignment in every course: “I pledge on my honor that I have not given or received any unauthorized assistance on this examination (or assignment).” Consequently, you will be requested to include this pledge on each exam and project. You may review the University’s Code of Academic Integrity for yourself at
https://www.faculty.umd.edu/teach/integrity.html

 Web Accessibility