CS 417 Exam info
The final exam will be held in our regular classroom on December 22, 2011 from 4:00-6:00pm. Please be sure to arrive on time! Expect around 50 multiple-choice questions in the style of those on mid-semester exams.
Remember that the final is optional and will only serve to displace a lower normalized grade on one of the three exams.
Past exams
Get past 417 exams and information about exam taking here.
Study Guide
The study guide is essentially a concatanation of the previous three study guides.
Topics
You are responsible for the material from since the start of the semester.
Topics that you should know and may be on the exam include:
Introductory material
- Metcalfe's law
- classification: Flynn's and beyond (multiprocessors, multicomputers, ...)
- SISD, SIMD, MIMD
- multiprocessors vs. multicomputers (networks of computers)
- symmetric multiprocessing
- bus-based multiprocessors
- cache coherency problems
- snoopy cache
- scalability beyond bus-based multiprocessors: switched multiprocessors
- Crossbars
- NUMA
- distributed system
- single system image
- transparency goals issues (very high-level understanding)
- location
- migration
- replication
- concurrency
- service models
- centralized
- client-server (and workstation)
- peer-to-peer
- processor pool
Networking
- broadband vs. baseband
- packets
- circuit & packet switched networking
- Understand the concept of protocols and layering
- I will not ask you to enumerate the OSI protocol stack but you should understand the functions of the network, transport, presentation, and application layers.
- ethernet transmission
- CSMA/CD
- unreliable, connectionless communication
- clients and servers and services
- transport endpoints (addresses) vs. machine endpoints
- connection-oriented vs. connectionless protocols
- IP (Internet Protocol)
- Interconnection of networks and routing
- IP address vs. ethernet address
- class-based addresses (why did we have classes A, B, and C?)
- I will not ask about "special" IP addresses (e.g. broadcast)
- port number: what is its purpose?
- TCP vs. UDP
- Protocol encapsulation concept
- ethernet device driver: encapsulate IP within an ethernet packet
- what is meant by routing?
- sockets
- concept
- operations: create, bind, connect, listen, shutdown, communication (know what they do, not what the parameters are)
- ARP
Quality of Service
- Service metrics
- Bandwidth
- Delay
- Jitter
- Errors
- Admission control vs. traffic control
- ATM - key points
- control over quality of service: fixed-size packets, path setup
- CBR, VBR, ABR
- IP quality of service
- Nagle's algorithm
- Differentiated service vs. guaranteed service
- Flow detection
- Traffic shaping vs. traffic policing
- DiffServ
- RSVP
- I will not ask about MPLS or 802.1p
- RTCP and RTP: understand the goals/concept
Naming and binding
- terminology (don't bother with directory service vs. name service)
- understand naming, binding, what a name is, what an address is
- compound name
- I will not ask you to define naming convention or context
- DNS
- what does DNS do?
- How do multiple DNS servers together provide domain name service?
- root name servers
- iterative vs. recursive name resolution
Clock synchronization
- Clock drift and skew, linear compensating function (understand the terms)
- Cristian, Berkeley algorithms
- Understand the formula for Cristian's and Berkeley
- Error bounds with Cristian's algorithm. Remember that errors are cumulative.
- NTP
- NTP Synchronization subnet
- NTP strata
- You don't need to memorize the NTP(SNTP) formula but be sure to understand how it gives you the same result as Cristian's (don't hesitate to to the algebraic expansion on paper!)
- You don't need to know the symmetric mode NTP algorithms
- You don't need to know how dispersion and jitter are calculated
- You don't need to know the NTP message structure or validation tests
Logical clocks
- Lamport's algorithm
- goals
- happened-before relationship
- partial ordering
- algorithm
- deficiencies
- remedy for generating unique timestamps
- Vector clocks
- goals
- algorithm
- know how to identify concurrent events
Group communication
- I won't ask about anycast
- closed vs. open groups, hierarchical groups
- reliability: atomic, reliable, unreliable
- ordering: global-time ordered, total ordered, causal ordered, sync ordered, FIFO ordered, unordered multicasts
- sending versus delivering messages, hold-back queue
- IP multicasting
- IP vs ethernet multicast addresses
- how is ethernet multicast implemented?
- know that some bits of an IP class D address are mapped onto an ethernet multicast address (you don't have to remember that it's 23 of the 28 bits)
- hash of address
- exact match on a small list
- driver still has to check whether to reject the packet
- IGMP
- multicast-aware router
- understand the basic protocol
- you don't need to memorize the distinctions between v1, v2, and v3 of IGMP
- just know that v1 didn't require a client to leave a group
Remote procedure calls
- concept of a remote procedure call (RPC)
- language vs. OS construct
- stub functions
- marshalling
- data representation
- explicit versus implicit typing
- semantics of remote procedure calls
- failure of RPCs
- at most once and at least once semantics
- Interface definition language (notation) - purpose
- RPC compiler
- Sun (ONC) RPC
- service registration (portmapper)
- service lookup, capabilities
- DCE RPC (improvements over SUN RPC)
- cell directory
- UUID
- multi-canonical marshaling
- Microsoft COM+/DCOM/ORPC
- improvements in RPC over DCE
- integration with COM
- surrogate processes
- distributed garbage collection: reference counting, pinging
- CORBA (general concepts, CORBA services)
- Java RMI (general concepts, service registration and lookup, serialiazable class, remote class)
- garbage collection in RMI (dirty, clean)
- XML RPC/SOAP (very general concepts)
- Purpose of WSDL
- Microsoft .NET remoting (general concepts)
- channels (binary/TCP, SOAP/HTTP/TCP, binary/named pipes)
- single call, singleton objects, and client-activated objects
- Leasing distributed garbage collector (LDGC)
- AJAX/XMLHTTPRequest, REST (very general concepts)
- I will not ask about the windows communication framework
- We did not cover the enterprise service bus framework, so that will not be on the exam
Scalability
- farm, geoplex (replicated farm)
- service cloning
- service partitioning
- shared disk vs. shared nothing
- active-active vs. active-passive
Mutual exclusion
- Types: centralized, token-based, contention-based
- central-server algorithm
- distributed algorithms:
- Ricart & Agrawala
- Token ring
- Lamport's mutual exclusion algorithm
Election algorithms
- Bully algorithm
- Ring algorithm
- Chang and Roberts ring algorithm optimization
- split-brain problem (network partitioning/segmentation)
Consensus
- Consensus
- Faults
- Fail-stop
- Fail-silent
- Byzantine faults
- Synchronous versus asynchronous systems
- Two army problem
- Byzantine generals problem
- Paxos
- Roles: proposer, acceptor, learner
- Replicated state machine
- active-active, active-passive
- leasing vs. locking
- hierarchical leases
Distributed transactions
- properties of transactions (ACID)
- nested transactions: private work spaces, write-ahead logs
- two-phase commit protocol (understand the need for a log and what happens in each of the phases)
- three-phase commit protocol
- Paxos commit
- Brewer's CAP Theorem
- BASE approach vs. ACID
Distributed deadlocks
- conditions for deadlock, deadlock resolution
- centralized detection, false deadlock (what's meant by this)
- Chandy-Misra-Haas probing algorihm (just the high level)
- prevention: wait-die, wound-wait
Concurrency control
- schedules
- lock manager
- two-phase locking
- cascading aborts
- strict two-phase locking
- separating read and write locks
- optimistic concurrency control
- timestamp ordering
Distributed file systems
- file service, file server
- service models: remote access vs. upload/downloa
- consistency: sequential semantics, session semantics, ambiguous semantics
- stateful vs. stateless design
- caching
- case studies
- NFS
- goals
- mounting filesystems, goals of automounter
- UDP vs TCP transport
- directory and file access protocol (don't memorize the list of RPC calls but understand why they exist, what lookup does, why there's no open)
- caching
- inconsistencies because of caching & validation
- know that a user-level lock manager was added to NFS but otherwise don't bother with features of successive versions.
- don't bother with NFS v4
- AFS (versions 1,2)
- goals
- caching
- service model, semantics
- cache coherence (callbacks)
- CODA
- goals
- replicated volumes
- resolution
- disconnected operation
- reintegration
- DFS/AFS v.3
- goals
- token mechanism
- SMB/CIFS
- goals
- high-level protocol (RPC-like)
- caching model: oplocks
- Miscellaneous
- GmailFS (just the concept)
Cluster file systems
- GFS
- chunkservers
- master
- chunk handle
- operation log
- chunk lease
- data flow vs. control flow
- HDFS
- NameNode
- DataNode
- blocks
- FsImage
Distributed lookup services
- Centralized server
- Flooding
- Distributed Hash Table (DHT)
- Chord
- Content-Addressable Network
Content Delivery Networks
- Flash crowd problem
- dynamic DNS
- domain resolution
MapReduce
- Master vs. Worker
- shards
- Map worker
- Map function
- Partition operation
- Reduce worker
- Sort operation
- Reduce operation
BigTable
- Tablets, column families, columns, column family versions
- Table splitting,
- master server, tablet servers
- chubby
- eventual consistency
Google Cluster Architecture
- system design goals
- areas of scalability: DNS load balancing, web load balancing, index & content partitioning (shards), replication of shards
Amazon Dynamo
- Purpose: distributed, highly available key-value store
- distributed hash table (DHT)
- consistent hashing
- functional interface: get, put
- consistency model for replication
- goals: incremental scalability, symmetry, decentralization, heterogenous systems
- conflict resolution
- who does it?
- use of vector timestamps for conflict detection
- partitioning among multiple nodes
- virtual node
Cryptographic systems (just what we covered in class)
- types of ciphers:
- restricted ciphers: what are they?
- public key cryptography: use of two keys
- symmetric cryptography
- purpose of a secret key
- triple DES versus DES versus double DES
- key exchange (using third party, Diffie-Hellman, and public Keys)
- Diffie-Hellman (know what it does and that it uses an (ab)mod c one-way function; don't memorize the algorithm)
- public key cryptography (how do you use it?)
- understand the difference between public key and symmetric algorithms; what private keys are used for, what public keys are used for.
- how do you use it for encryption, signatures, and key exchange?
- do not memorize algorithms: I will not ask you to recite the RSA or Diffie-Hellman algorithms or to explain what an s-box does in DES. You should know that RSA is based on the difficulty of factoring large products of primes and that it works via an abmod c function (take a look at the slides for Diffie-Hellman and RSA key generation).
- hybrid cryptosystem (understand advantage and how key generation/exchange works)
- difference between public key and symmetric cryptography
- one-way functions (what are they good for?, McCarthy's puzzle)
- hash functions (what are they good for?)
- digital signatures (with public key or symmetric cryptosystems)
- using an arbitrated protocol
- using public keys
- secure communciation
- with symmetric cryptography
- with public key cryptography
- with a hybrid cryptosystem
Authentication
- The four A's
- Authentication: techniques of identification
- Authorization: granting access control (based on user ID and access control list, permissions in a ticket such as Kerberos, or consulting a trusted authorization server)
- Accounting: logging
- Auditing: inspecting source
- multi-factor authentication
- what are the three factors?
- reusable passwords (PAP, password authentication protocol)
- one-time passwords
- challenge handshake authentication prtocol (CHAP)
- authentication with symmetric cryptography
- Kerberos (know how it works - don't worry about memorizing details, know how tickets work and the purpose of the server). See the lecture slides. Don't worry about the distinction between the authentication server and ticket granting server.
- public key authentication
- how does it work? Understand the use of the nonce
- certificates (don't memorize fields but understand the purpose and how a certificate works)
- understand the principles of SSL - it's just a hybrid cryptosystem
- Principles of CAPTCHA: what's the goal?
- biometric authentication
- purpose of the ROC plot
- robustness vs. distinctiveness
- sensing, signal processing, pattern matching (imprecise, never exact)
OpenID & OAuth
- OpenID
- single sign-on
- role of an OpenID provider
- HTTP redirection
- OAuth
- purpose vs. OpenID
- role of provider and consumer
- role of request and access tokens
System security: Firewalls & VPNs
- Firewalls
- packet filtering (how does it work): stateful vs. stateless
- bastion hosts (definition)
- proxies
- screened subnet architecture, DMZ
- deep packet inspection
- Virtual Private Networks (VPNs)
- principles, not details
- tunneling
Virtual Private Networks
- principles, not details
- tunneling
Fault tolerence
- types of faults (transient, intermittent, permanent)
- types of redundancy
- TMR (what does it do?)
- Two-army problem (what does it demonstrate?)
Clusters
- definition
- clustering types: performance, batch processing, high availability, load balancing (just the goals)
- failover types: cold, warm, hot; multidirectional, cascading
- shared disk, shared nothing, distributed lock manager
- what is a heartbeat network?
- What is a system area network?
- What is a storage area network?
- Don't memorize RAID levels but understand that there's disk striping for performance and disk mirroring and error correcting codes (parity) for fault tolerence.
Mobile ad hoc networks
- Mesh (decentralized) network
- Dynamic routing
- Source routing
- Route discovery: flooding
- Table-driven routing
- ZigBee mesh network
- know it's a self-configuring, self-healing network with redundant paths
- coordinator
- router
- end devices
- distance vector
- you don't have to know about encryption, keys, or the trust center
Process migration
- migratory vs. non-migratory processes
- home system
- dealing with state
- don't bother with the algorithms
Distributed shared memory
- implementation: page fault handling, locating pages via directory
- distributed vs. centralized directory
- copysets
- false sharing
- granularity (what's the trade-off?)
- home-based vs. non-home based algorithms
- sequential consistency: program order and write atomicity requirements
- weak consisitency models (sync), release consistency, entry consistency (I won't ask about eager vs. lazy)