Distributed And Cloud-Based Storage Systems
https://sedna.cs.umd.edu/818
MW 2:00pm - 3:15pm, CSI 3118


The guiding philosophy of this course is that the best way to learn about real systems is to build one. We will gain an in-depth understanding of the issues involved in designing and deploying large-scale distributed file systems. In the course of this investigation we will be tackling a variety of topics, such as peer-to-peer systems, remote procedure calls, multi-threading, consensus protocols, cloud systems, layered systems (supporting high-level consistency guarantees on top of cloud services), and security as it relates to such systems.

Announcements:

Professor

Pete Keleher <keleher@cs.umd.edu> (include "828" in all correspondance)
Office hours: W 3:30 - 5:00pm, and by appt. in AVW 4157.

Information

The class will consist of lectures by the instructor, student project presentations, a final, and a series of probably four programming projects, all in the language Go (fear not if you don't know anything about go, we'll all be learning together). The end goal is to have built a full-scale reliable, highly-available, and secure distributed file system, using both local disks and cloud services as backing stores. My lectures will be split between those describing the tools we will use to build our file systems, and lectures based on recent research in the literature (such as those at FAST 2014, SOSP 2013, OSDI 2012, and USENIX ATC 2014).

Examples of technologies we may use include FUSE (and MacFUSE), key value stores like Bolt or gkvlite or diskv or leveldb-go, the Amazon Simple Storage Service (and go binding), Google's Protocol Buffers or json (from Go), Google's Go language, PAXOS, SQLite, Snappy, and Apple's development kit for the iPad.

Office hours: after class in my office (4157 A.V. Williams).

Note that the following set of papers is only a placeholders: more will come, some will go away.

     

      Note: this list may be out of date and will be updated by the first day of class. The papers will continue to be mutable until the week before the any given day, so please continue to check.

Tuesday Thursday
Aug 27
Intro
Solve the following puzzle, then copy your solution into a fresh playground, and send me the "Share" url.

(notes)

Aug 29
Intro/Go
Reading:

(notes)

Sep 3
File Systems
"The Design and Implementation of a Log-Structured File System"

 

"A Low-bandwidth Network File System"

 

(notes)

Sep 5
Versioning
"Deciding when to forget in the Elephant file system"

 

"Knockoff: Cheap Versions in the Cloud"

Sep 10
"The Google File System" - justin

 

"GFS: Evolution on Fast-forward"

Sep 12
Global system event orderings.

(notes)
Project 1 due Friday midnight.

Sep 17
"Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System" - james

       and

("Flexible update propagation for weakly consistent replication"

       or

"Session Guarantees for Weakly Consistent Replicated Data")

Sep 19
 "Quantifying Eventual Consistency with PBS" - patrick

 

 "Bolt-on causal consistency"

(notes)

Sep 24
Databases. No reading  

(slides)

Sep 26
More Databases. No reading
(see above slides)
Oct 1
"On verifying causal consistency" - michael r

"f4: Facebook’s Warm BLOB Storage System" - yuval

Oct 3
"Highly Available Transactions: Virtues and Limitations" - tasnim

"Salt: Combining ACID and BASE in a Distributed Database" - charles
Project 2 due Sunday midnight.

Oct 8
"Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications" - kyle

 

"OceanStore: An Architecture for Global-Scale Persistent Storage" - Zhehan

Oct 10
"Sinfonia: A New Paradigm for Building Scalable Distributed Systems" - omer

 

"Dynamo: Amazon's highly available key-value store" - ian

Oct 15
"Scalable Causal Consistency for Wide-Area Storage with COPS"

 

"Ambry: LinkedIn’s Scalable Geo-Distributed Object Store" - akshay

Oct 17
Paxos
 "Paxos Made Live: an Engineering Perspective"

"The Part-Time Parliament"

(notes, ousterhout paxos)
Project 3 due Sunday midnight.

Oct 22
Paxos
"Egalitarian paxos"

 

"A Generalised Solution to Distributed Consensus"

(notes)

Oct 24
Paxos
"Egalitarian paxos" redux

 

"Flexible Paxos: Quorum intersection revisited"

Oct 29
 "In search of an understandable consensus algorithm"

Oct 31
"Implementing Linearizability at Large Scale and Low Latency" - kevin

 

"A KVS For Any Scale" - manli

Nov 5
"Spanner: Google's Globally-Distributed Database" - sindhura

"Living Without Atomic Clocks" (cockroachdb ref)

 

Definitions: Linearizability versus Serializability
Project 4 due Wednesday midnight.

Nov 7
"FAWN: A fast array of wimpy nodes"

 

"Scalable Consistency in Scatter" - yi

 

Nov 12
Sections 1-3 from "Calvin: fast distributed transactions for partitioned database systems"

 

Sections 3.0-3.3 from "SLOG: Serializable, Low-latency, Geo-replicated Transactions"

Nov 14
More SLOG

 

"CORFU: A Shared Log Design for Flash Clusters." - brandon

 

Nov 19
Fault Tolerance and Security
"Practical Byzantine Fault Tolerance" - johann

 

"Tango: Distributed data structures over a shared log"

 


Project 4b due Wednesday midnight.

Nov 21
"Lazy database replication with ordering guarantees"
(Correctness Anomalies Under Serializable Isolation)

 

"Fast crash recovery in RAMCloud" - erin

 

Nov 26
"Secure Untrusted Data Repository" - michael reininger

 

"SPORC: Group Collaboration using Untrusted Cloud Resources" - christian

 

Nov 28Thanksgiving
Dec 3
"Building Consistent Transactions with Inconsistent Replication"

 

"The Fuzzylog: a Partially Ordered Shared Log" - zhichao

Dec 5
"Sharding the shards: managing datastore locality at scale with Akkio"

 

"Camlistore/Perkeep is your personal storage system for life"
Project 5 due December 12 midnight.

Late Policies

All projects will have a due date, and a late due date two days later.
  • Do each project by yourself. Sadly, we can and do detect and fail those that do not abide by this policy each semester. You may ask, and answer, general questions on Piazza.
  • Your grade loses 20% of the max score if the project is turned in after the due date, but by the late due date. Anything after the late due date gives you a zero.

Attendance and general grading policies

Students are responsible for all material covered, and all announcements, deadlines, policies, etc., discussed in lecture and discussion section, regardless of whether they were in class to hear the information or not. It’s understood that students may occasionally have to miss class for various reasons, but email and office hours are not intended as a replacement for class attendance. Consequently, only students who typically and regularly attend class will receive assistance during office hours.

Coursework will count toward the final grade according to the following percentages:

  1. Projects: 70%
    • There will five to six projects, each with approximately a two-week time window.
    • Projects will be weighted equally.
    • All projects must get at least half credit to pass the course.
  2. Blog entries: 5%
    • You are required to upload a blog entry before each class except the first. More details in class.
  3. Midterm/Final: 25%
    • There will be one test, timing TBD.

Exam

Final exam: TBD

Academic integrity

The Campus Senate has adopted a policy asking students to include the following statement on each examination or assignment in every course: “I pledge on my honor that I have not given or received any unauthorized assistance on this examination (or assignment).” Consequently, you will be requested to include this pledge on each exam and project. You may review the University’s Code of Academic Integrity for yourself at
https://www.faculty.umd.edu/teach/integrity.html

 Web Accessibility