File Operations

In this section, you’ll design three instance methods to support creating new files or overwriting the contents of existing files, reading file contents, and appending content to the end of existing files.

Design Requirements: Namespacing

Note that different users can have files with the same name. A user’s namespace is defined as all of the filenames they are using. One user’s namespace could contain a filename that another user is also using. In that other user’s namespace, that same filename could refer to a different file (or the same file, if it was shared - more details about sharing later).

This example scenario illustrates how file storage and namespacing works.

  • EvanBot calls StoreFile("foods.txt", "pancakes").
    • Assuming that EvanBot has never stored to foods.txt before, this creates a new file called foods.txt in EvanBot’s personal namespace.
  • EvanBot calls LoadFile("foods.txt") and sees “pancakes”.
  • EvanBot calls StoreFile("foods.txt", "cookies").
    • Because foods.txt is an existing file, this call should overwrite the entire file with the new contents.
  • EvanBot calls LoadFile("foods.txt") and sees “cookies”.
  • EvanBot calls LoadFile("drinks.txt") and sees an error, because there is no file named drinks.txt in EvanBot’s personal namespace.
  • EvanBot calls AppendToFile("foods.txt", " and pancakes").
    • Instead of overwriting the entire file, this should append additional contents to the end of an existing file.
  • EvanBot calls LoadFile("foods.txt") and sees “cookies and pancakes”.
  • EvanBot calls AppendToFile("foods.txt", " and hash browns").
  • EvanBot calls LoadFile("foods.txt") and sees “cookies and pancakes and hash browns”.
  • EvanBot calls StoreFile("foods.txt", "pancakes").
    • This overwrites the entire file (including appends) with the new contents.
  • EvanBot calls LoadFile("foods.txt") and sees “pancakes”.
  • EvanBot calls AppendToFile("drinks.txt", "and cookies") and sees an error, because there is no file named drinks.txt in EvanBot’s personal namespace.
  • CodaBot calls StoreFile("foods.txt", "waffles").
    • Note that this creates a new file in CodaBot’s personal namespace named foods.txt. This should not interfere with the foods.txt file in EvanBot’s namespace, which is a different file.
  • CodaBot calls LoadFile("foods.txt") and sees “waffles”.
  • EvanBot calls LoadFile("foods.txt") and sees “pancakes”.

Design Requirements: Files

Confidentiality of data:

  • You must ensure that no information is leaked about these 3 pieces of data:
    • File contents for all files.
    • Filenames for all files.
    • The length of the filenames for all files.
  • You must also ensure that no information is leaked that could be directly or indirectly used to learn these 3 pieces of data.
    • For example, if you have a secret key that you’re using to encrypt some file contents, you’ll need to ensure that secret key is not leaked either.
  • You may leak information about any other values besides the ones listed above.
    • For example: It’s okay if an adversary learns usernames, length of a file, how many files a user has, etc.

Integrity of data:

  • You must be able to detect when an attacker has tampered with the contents of a file.

Filenames:

  • Filenames can be any string with 0 or more characters (not necessarily alphanumeric, and could be the empty string).
  • Different users can have files with the same filename, but they could refer to different files.

Design Question: File Storage and Retrieval: How does a user store and retrieve files?

EvanBot is logged in and stores a file. How does the file get stored in Datastore? What key(s) and UUIDs do you use? How do you access the file at a later time?

User.StoreFile

This instance method is used to both create a file for the first time, or to overwrite an existing file entirely with new contents. To use this method, the user passes in the filename to identify the file, as well as the contents that they wish to store.

User.StoreFile(filename string, content []byte) (err error)

Given a filename in the personal namespace of the caller, this function persistently stores the given content for future retrieval using the same filename.

If the given filename already exists in the personal namespace of the caller, then the content of the corresponding file is overwritten.

The client must allow content to be any arbitrary sequence of bytes, including the empty sequence.

Note that calling StoreFile after malicious tampering has occurred is undefined behavior, and will not be tested.

Note that calling StoreFile on a file whose access has been revoked is undefined behavior, and will not be tested.

User.LoadFile

User.LoadFile(filename string) (content []byte, err error)

Given a filename in the personal namespace of the caller, this function downloads and returns the content of the corresponding file.

Note that, in the case of sharing files, the corresponding file may or may not be owned by the caller.

Returns an error if:

  1. The given filename does not exist in the personal file namespace of the caller.
  2. The integrity of the downloaded content cannot be verified (indicating there have been unauthorized modifications to the file).
  3. Loading the file cannot succeed due to any other malicious action.

User.AppendToFile

User.AppendToFile(filename string, content []byte) (err error)

Given a filename in the personal namespace of the caller, this function appends the given content to the end of the corresponding file.

content can be any arbitrary sequence of 0 or more bytes.

Note that, in the case of sharing files, the corresponding file may or may not be owned by the caller.

You are not required to check the integrity of the existing file before appending the new content (integrity verification is allowed, but not required).

Returns an error if:

  1. The given filename does not exist in the personal file namespace of the caller.
  2. Appending the file cannot succeed due to any other malicious action.

Design Requirements: Bandwidth & Append Efficiency

All functions except for AppendToFile have no efficiency requirements, as long as they don’t time out the autograder. You can submit your code to Gradescope to check that you aren’t timing out the autograder (~20 minutes).

The efficiency requirement for appending is measured in terms of bandwidth, not in terms of time complexity or space complexity, which you may have seen in other classes. This means that your append can use unlimited amounts of local compute (e.g. you can encrypt and decrypt as much data as you’d like).

Recall that DataStore and KeyStore are remote databases. This means that when you call DataStoreGet, you are downloading all data at the specified UUID from DataStore to the local device running your code. Similarly, when you call DataStoreSet, you are uploading all the specified data from your local device running your code to DataStore. The only efficiency requirement forAppendToFile is that the total amount of data uploaded with calls to DataStoreSet and downloaded with calls to DataStoreGet must be efficient.

The bandwidth used by a call to AppendToFile is defined as the total size of all data in calls to DataStoreSet and DataStoreGet. All calls that are not DataStoreSet or DataStoreGet do not affect the total bandwidth.

The total bandwidth should only scale with the size of the append (i.e. the number of bytes in the content argument to AppendToFile). In other words, if you are appending n bytes to the file, it’s okay (and unavoidable) that you’ll need to upload at least n bytes of data to the Datastore.

Your total append bandwidth can additionally include some small constant factor. We cannot reveal the exact number, but an example of a reasonable constant would be 3,000 bytes on every call to append.

The total bandwidth should not scale with (including but not limited to):

  • Total file size
  • Number of files
  • Length of the filename
  • Number of appends
  • Size of previous append
  • Length of username
  • Length of password
  • Number of users the file is shared with

Here is one way to consider whether your design scales with the number of appends. Suppose we call AppendToFile on a file 10,000 times, appending 1 byte every time. The 1,000th and 10,000th call to AppendToFile should use the same total bandwidth as the 1st append operation. This should also be true for the case of appending an arbitrary number of bytes repetitively, 10,000 times (even 0!). Specifically, when a scheme scales with the number of appends, this means that the amount of bandwidth used is directly proportional to the number of appends that have occurred, not just the fact that appends have happened. This means if someone appended 0 bytes to a file 10,000 times (called AppendToFile with an empty string passed in), the 1,000th and 10,000th call to AppendToFile should use the same total bandwidth as the 1st. If the bandwidth increases even though the string appended is always the empty string (and hence, nothing actually changed about the file), then this design would scale with the number of appends.

Here is one way to consider whether your design scales with the size of the previous append. Suppose we call AppendToFile to append 1 terabyte of data to a file. Then, we call AppendToFile again on the same file to append another 100 bytes. The total bandwidth of the second call to append should not include the 1 terabyte of bandwidth from the previous (first) append.

In general, one way to check for efficiency is to imagine a graph where the x-axis is the potential scaling factor (e.g. file size), and the y-axis is the total bandwidth. The plot of scaling factor vs. total bandwidth should be a flat line, not an upwards sloping line.

As an analogy, imagine that the users of this system have a limited phone data plan. We want to avoid excessive charges to their data plan, so we want to avoid downloading or uploading unnecessary data when appending.

For example, a naive implementation would involve:

  1. The user calls DataStoreGet to download the entire contents of the file.
  2. The user decrypts the file locally.
  3. The user appends the contents locally.
  4. The user encrypts the entire file contents.
  5. The user calls DataStoreSet to upload the entire file to DataStore.

Note for steps 2 & 4: These parts do not count against bandwidth efficiency. Recall, only DataStoreGet and DataStoreSet count for bandwidth calculation, and local computations do not count against efficiency requirements.

This implementation is inefficient because in step 1, the call to DataStoreGet downloads the entire file. This implementation is additionally inefficient due to step 5, where we call DataStoreSet and upload the entire file contents.

For example, if we had a 10 terabyte file, and we wanted to append 100 bytes to the file, the implementation above would have a total bandwidth of 20 terabytes + 100 bytes. An efficient implementation would use 100 bytes of bandwidth (possibly plus some constant).

Design Question: Efficient Append: What is the total bandwidth used in a call to append?

List out every piece of data that you need to upload (DatastoreSet) or download (DatastoreGet) from Datastore in a call to append, and the size of each piece of data. Is the total a constant, or does it scale?