Assignment 7

Due Wednesday, April 22, 2015 11:00pm on-line via sakai

Files

FAQ

Check the FAQ.

Overview

A file system is a collection of methods and data structures for storing and organizing data in secondary storage. Typically, file systems are used on block-based hardware, such as disks.

A particularly valuable aspect of file systems is their namespace. Users and programs know how to open and read files. By exposing a service as a file system, a service provider can avoid the need to create and document special-purpose APIs. The process file system, procfs, mounted under /proc on Linux systems is an example of the use of a file system namespace to access information about processes and other kernel structures. This assignment will allow you to get experience implementing a file system without the worries and trouble of kernel programming.

The concept of a file system that does not actually store files may seem strange, but this is a natural extension of the UNIX "everything is a file" philosophy. For more information, see the brief Everything is a file wikipedia article, The Use of Name Spaces in Plan 9, and the file systems lecture notes.

Background

We will be using FUSE (File system in USerspace) to implement a virtual file system. FUSE is an OS kernel module that is mounted under VFS and passes requests back up to a user process. This enables us to implement the functionality required to create a file system entirely in userspace, eliminating the necessity for kernel-level programming.

Group size

The assignment should not take a lot of effort. However, since the class is huge, you may work in a group of up to five students. Be sure to identify the group members in your writeup and in a comments in your code.

You will be held to higher standards if working in a larger group. Be sure to submit a beautifully-written report that summarizes your design and all the test cases you have performed.

Languages

The file system must be implemented in C using the FUSE library.

Assignment

The goal is to develop a specialized read-only file system module for FUSE. The file system that you will implement will not store actual files, but rather, will perform a various mathematical operations based on the names and numbers given to it. We call it mathfs.

You are provided with a minimal source code base, demo, and Makefile to get you started. Some of the key functions will be already defined.

Additionally, you must write a report detailing your project implementation.

Your file system must have the following characteristics:

  1. The root of the mathfs file system must contain seven directories, each representing a mathematical function:

    1. /factor - Computes the prime factors of a number.
    2. /fib - Computes the first n fibonacci numbers.
    3. /add - Adds two numbers
    4. /sub - Subtracts two numbers.
    5. /mul - Multiplies two numbers.
    6. /div - Divides two numbers.
    7. /exp - Raises a number to a given exponent.
  2. An ls command executed on any of these directories should only show a single file: doc. The contents of this file should be the documentation of the particular function. This can be a brief textual description with a couple of examples.
  3. A function can be invoked by opening a "file" under the mathfs mount point and reading its contents. For example:

    1. /factor/138 should act as a file containing the numbers 2, 3, 23, one per line and ending with a newline.
    2. /fib/3 should act as a file containing the numbers 1, 1, 2, one per line and ending with a newline.
    3. /add/5/3 should act as a file containing the number 8 and ending with a newline.
    4. /sub/5/3 should act as a file containing the number 2 and ending with a newline.
    5. /mul/5/3 should act as a file containing the number 15 and ending with a newline.
    6. /div/5/3 should act as a file containing the number 1.6666 and ending with a newline.
    7. /exp/2/3 should act as a file containing the number 8 and ending with a newline.

    Note that these "files" should not be listed by ls, as there are an infinite number of them.

Demo

To get a feeling of what is expected of you, you can play with my compiled version of the code. Download the mathfs.tar file that contains the source code base and makefile. It includes a file named mathfs-demo. This is a version of the file system compiled to run on the iLab Linux machines. To test it out:

  • Unpackage the archive:

    
    tar xvf mathfs.tar
    
  • This will create a directory called mathfs. Go there:

    
    cd mathfs
    
  • Create a mount point for the file system. Let's call it math:

    
    mkdir math
    
  • Run the file system on the mount point. It will run in the background so you won't see anything:

    
    ./mathfs-demo math
    
  • Try some things. A top-level directory listing:

    
    $ ls -l math
    total 0
    drwxr-xr-x 3 root root 0 Apr 10 22:06 add
    drwxr-xr-x 3 root root 0 Apr 10 22:06 div
    drwxr-xr-x 3 root root 0 Apr 10 22:06 exp
    drwxr-xr-x 3 root root 0 Apr 10 22:06 factor
    drwxr-xr-x 3 root root 0 Apr 10 22:06 fib
    drwxr-xr-x 3 root root 0 Apr 10 22:06 mul
    drwxr-xr-x 3 root root 0 Apr 10 22:06 sub
    
  • Show the contents of the div directory:

    
    $ ls -l math/div
    total 0
    -r--r--r-- 1 root root 64 Apr 10 22:07 doc
    
  • Add the numbers 1234 and 5.52. I use sprintf's %g format, so decimals sometimes get rounded. You're welcome to change this to any reasonable behavior.

    
    $ cat math/add/1234/5.52
    1239.52
    
  • Last 10 numbers in a 50-element fibonacci sequence:

    
    $ cat math/fib/50|tail -10
    165580141
    267914296
    433494437
    701408733
    1134903170
    1836311903
    2971215073
    4807526976
    7778742049
    12586269025
    
  • You should handle basic errors, such as divide-by-zero:

    
    $ cat math/div/500/0
    divide by zero error
    
  • Factoring works with integers:

    
    $ cat math/factor/223092870
    2
    3
    5
    7
    11
    13
    17
    19
    23
    
  • But you should barf on factoring floating point numbers:

    
    $ cat math/factor/5.5
    the number to factor must be an integer
    
  • Each top-level subdirectory contains a doc file:

    
    $ ls -l math/*/*
    -r--r--r-- 1 root root 56 Apr 10 22:16 math/add/doc
    -r--r--r-- 1 root root 64 Apr 10 22:16 math/div/doc
    -r--r--r-- 1 root root 85 Apr 10 22:16 math/exp/doc
    -r--r--r-- 1 root root 87 Apr 10 22:16 math/factor/doc
    -r--r--r-- 1 root root 85 Apr 10 22:16 math/fib/doc
    -r--r--r-- 1 root root 65 Apr 10 22:16 math/mul/doc
    -r--r--r-- 1 root root 68 Apr 10 22:16 math/sub/doc
    
  • It documents what each directory does:

    
    $ cat math/exp/doc
    Raise a number to an exponent.
    The file exp/a/b contains a raised to the power of b.
    
  • Important: when you're done, unmount the file system.

    
    $ fusermount -u math
    

Recommended Procedure and Hints

  1. Read the FUSE documentation on the web.

    Try the example hello, world program. Following the instructions on the main documentation page and make sure that you can compile the program and run it under FUSE. The iLab machines have FUSE installed. If you're using your own Linux or Mac systems, you will have to install FUSE. The instructions for compiling the program are in the comment header of the program.

    Do not do anything until you complete the step. You want to make sure that you have no problems in using FUSE. Run the program and your program with the debug option enabled. This is the only way you'll know what's going on. To run debugging, just suffix a -d flag at the end of the command:

    
    ./hello mnt -d
    

    Note that with the -d flag, the program will not run in the background. You can kill it at any time by sending it a kill character (control-C, by default). You will need to use another terminal window that's running a shell to enter commands that access the file system. For example:

    
    ls -l mnt
    
  2. You can start with the empty template but you might find it easier to start with the hello, world program and use that as a template. Change all references to mathfs or something that makes sense.
  3. You will need to implement only the following FUSE API functions:

    1. getattr()
    2. readdir()
    3. open()
    4. read()

    Start with skeletal implementations that do very little. Place a printf statement at the head of each one so you can understand when each function is called and with what paramters. For example:

    
    printf("getattr(\"%s\"\n", path);
    

    You'll notice that the first calls are always to getattr to test if a file exists, if it is a file or directory, and what its size is. You'll also see how it is called with the pathname built up a component at a time.

  4. Get getattr to recognize a different files and directories in your file system. For a start, you can hard-code a few pathnames and do strcmp against path. Set the mode to a regular file or a directory as necessary. Look at what the hello, world example does.

    With getattr working, you can now run the ls command on these names but not yet do directory listings. For example:

    
    	ls -ld math/sub
    	ls -ld math/add/555
    
  5. The next step is to do support the larger set of files. There are several ways you can do this (e.g., initialize a tree structure in main). I took a brain-dead approach and defined a single table that contains a pattern to match and identifies whether the pattern is a file or directory. This table of structs also has a field for each entry to define how the name is treated (e.g., it is an "add/doc" document file, it calls the subtract function, it's a top-level directory, etc.). This way, all searches can be table-driven. Do what works for you. If you find yourself writing a whole lot of if statements or hundreds of lines of code then you're doing something wrong (and ugly!).

  6. Now get readdir working so you can show directory listings. You'll be given a pathname and need to generate a directory listing. At the top level (/, you will present, the entire set of command sub-directories: add, sub, mul, .... Under each function, there will be a doc file. For paths such as add/5 or mul/9.24, you will show an empty directory.
  7. Implement the mathematical functions themselves. Use common sense in the breadth of your support. Normal C double-precision arithmetic with decimal input/output is acceptable. I used strtod for my parsing and sprintf for output. You need not support arbitrary precision numbers, fractions, or support complex numbers. Be sure to handle basic cases of overflow (e.g., fibonacci numbers wrapping around), divide-by-zero, and any other errors that may arise. Your program should not die under any circumstances and file output should be generated under all circumstances.
  8. You need not bother with caching results for this assignment. However, you might find that you need to generate results twice because of that: once for getattr to query the size of the "file" and another time in the read request to get the ressults. Not very efficient but it doesn't matter here; just one or two extra lines of code to get the results in getattr and set the file size.
  9. Test the file system thoroughly. In particular, try misspellings, non-numbers, and numbers that generate overflow or divide-by-zero situations.

Additional Suggestions

  1. Develop incrementally and test often. Problems in one part may propagate to others without being obvious.
  2. Implement things carefully. A little planning will go a long way. Spend some time thinking about how you wnat to embed information about the file system and handle the parsing of pathnames that are sent to you.

How to Submit

  1. After testing your implementation and writing your report you must submit your solution on sakai under Assignment 7.
  2. Rezip/retar the directory structure that I provide for you, including your modifications. Do not include any object (.o) files or executables. Do NOT include the demo code (mathfs-demo).
  3. You should attach your archive file and your report document to the sakai assignment submission.

Be sure to indicate all members of your team in both your report and the opening comments in your code. If it takes us effort to figure out whose program this is, you will lose points. Only one team member needs to submit the assignment but be sure that the comments make it clear who the members are.

Submit only what is required. Do not submit object files, executables, or temporary editor files.

Hand the assignment in via sakai. Only one group member should submit the assignment.

Technical Information

iLab computers

I highly recommend that you use the iLab machines to complete this project. These machines already have FUSE installed and configured properly. You are responsible for making sure that your assignment runs on this platform.

Windows Systems

Please do not attempt to do this project on a Windows system. FUSE is intended to be used for POSIX file systems. If you use Windows and MUST use your personal machine, you can either run a Linux VM or do the assignment on the iLab systems via SSH. The end result MUST compile on the iLab machines.

OS X Systems

If you're working on a Mac, you can install FUSE for OS X. FUSE for OS X offers several APIs, one of which is a superset of the FUSE API. Be sure to test your code on Linux periodically to ensure that you're not straying from a Linux-compatible API.

FUSE Documentation

Recitation notes
April 1 recitation notes (6 per page)
Project website

http://fuse.sourceforge.net

Contains the API documentation, kernel source (in case you want to compile & install), and examples. The "hello, world" file system is here. Other examples are here. That page contains a link to the fuse_operations struct, which contains the set of file operations that FUSE supports.

Additional tutorial

Writing a FUSE Filesystem: a Tutorial, by Joseph J. Pfeiffer, Jr., Ph.D., Emeritus Professor, Department of Computer Science, New Mexico State University

This provides a more detailed explanation of FUSE and contains a walkthrough of a different, but still small, file system.