Skip to content

Instantly share code, notes, and snippets.

@fabianbaechli
Last active January 4, 2026 11:50
Show Gist options
  • Select an option

  • Save fabianbaechli/8c088851a75d4d3ef49570d35203c11a to your computer and use it in GitHub Desktop.

Select an option

Save fabianbaechli/8c088851a75d4d3ef49570d35203c11a to your computer and use it in GitHub Desktop.

Assignment 3, Java

zvfs.java

Why read chunks of 8192 Bytes?

  • Buffers have to be a power of 2 because otherwise it’s possible to have a misalignment with filesystem blocks provided by operating system (requiring more blocks to be read for each operation, making IO slower)
  • 8192 is larger than 4096 which is the default on windows (⇒ performance gain)
  • 8192 (8 KiB) provides a good balance between performance and memory usage

What does this line private static final long MAX_FILE_SIZE = (1L << 32); // 4GB max do?

  • It shifts 1L 32 times to the left: 00000001 ⇒ 10000000 00000000 00000000 00000000 ⇒ 2^32 bits, which is 4.294 GB
    • 1 as Long because 2^32 is greater than INT_MAX in Java which is 2^31. If we shifted just 1 without the L, we would get a buffer overflow, wrapping around to negative

What does this line return (value + 63) & ~63L; do?

  • Used in the align64 method
  • value + 63: pushes non 64 aligned values to the next bucket (e.g. 0 stays in first bucket, 1 to 64 stay in second bucket and so on)
  • ~63L ⇒ 63 is bitwise 00111111, ~ inverts it, so it becomes … 11111111 11000000 ⇒ all 1 except the lowest 6 bits
  • We apply the mask with bitwise & ⇒ setting the lowest 6 bits to 0
  • With the lowest 6 bits set to 0, the resulting number will always be a multiple of 64!!!!!
    • Analogy for decimal: adding 100 to a number and turning the lowest 2 digits of the resulting number to 0 will have the resulting number always be a multiple of 100: 392 + 100 ⇒ 492 ⇒ 400 ⇒ 4 * 100

Remember: To 64-align a number, add 63 to it and clear the lowest 6 bits

Result

0 + 63 & ~63 => 0
1 + 63 & ~63 => 64
64 + 63 & ~63 => 64
65 + 63 & ~63 => 128

Where is align64 needed in the code?

  • It is only explicitly used when reading from and writing to the file data block
    • But this is only because file entries and header are already implicitly 64-aligned (they both are 64 bytes each)

Difference between a ByteBuffer and a Byte in Java?

  • A byte buffer is an abstraction over the primitive type byte[] it allows for reading and writing of various primitive types in different byte orders (allows reading as LittleEndian for example)
  • ByteBuffers operate with three variables all the time:
    • Capacity: Fixed size of the buffer, cannot change over time
    • Position: Where the next read / write happens
    • Limit: The “stop here” boundary
      • Practical implication for our code: in copyRange we check whether remaining bytes are smaller than buffer.capacity if so, we shrink its limit to remaining so that we don’t read more than necessary

Explain in this code snippet, why the file is opened in READ and WRITE:

  public static void writeAtPosition(Path fsPath, long position, byte[] data) throws IOException {
    try (FileChannel fc = FileChannel.open(fsPath, StandardOpenOption.READ, StandardOpenOption.WRITE)) {
      fc.position(position);
      fc.write(ByteBuffer.wrap(data));
    }
  }
  • It’s an oversight: here, only WRITE is necessary

What does buffer.flip() do?

  • It switches the buffer from write mode to read mode or from read mode to write mode (it uses position, limit, capacity internally which all get reset to the correct value when flipping)

Example: Switching from write mode to read mode

ByteBuffer buffer = ByteBuffer.allocate(totalSize);
buffer.order(ByteOrder.LITTLE_ENDIAN);

// Write to buffer
buffer.put(headerBytes);

// Switching from write mode to read mode
buffer.flip();
try (FileChannel channel = FileChannel.open(path, StandardOpenOption.CREATE, StandardOpenOption.WRITE, StandardOpenOption.TRUNCATE_EXISTING)) {
  // Read from buffer
  channel.write(buffer);
}

Explain why this code has to be wrapped in a loop:

while(buffer.hasRemaining() {
  out.write(buffer)
}

// why can't we just do:
out.write(buffer)
  • Because WritableByteChannel is allowed to only write a partial amount of data from the buffer

Explain what this code snippet does:

ByteBuffer buffer = ByteBuffer.allocateDirect(BUFFER_SIZE);
  • It allocates a ByteBuffer that points to off-heap memory (the ByteBuffer variable itself lives (like all java-variables) in the heap, but it points to off-heap-memory). This is called a direct buffer.
  • Direct buffers can be faster when reusing the buffer many times (as we do in copyRange) but are more expensive to instantiate (which is of no concern for us because we only instantiate once)
    • When reading the data from in , the data is read from the filesystem into user-space (= total memory off current java program, heap + off-heap memory)
    • With a heap ByteBuffer, Java may need a temporary native buffer and copy data between native memory (= off heap memory) and the heap buffer. With a direct ByteBuffer, the OS can often read/write directly into that off-heap memory, avoiding that extra copy to the intermediate buffer before and after each invocation.
    • Sidenote: Because memory of direct buffers lives outside the heap which is garbage collected, memory impact of direct buffers is hard to debug. That’s why heap buffers are preferred when speed isn’t critical
  • BUT: this is not the fastest way to do it
    • And a major oversight at our part
    • We could’ve used in.**transferTo** which would’ve let the OS handle the whole copying of bytes between two files without the need to load any data into our program
    • This would’ve made a much bigger performance impact than using direct buffer instead of a heap buffer!

Why can we do int bytesRead = in.read(buffer); in readHeader when we had to use a while loop when writing a buffer in copyRange?

  • It’s an oversight at our end: in.read can in theory return less bytes than the ones in buffer, even if the file has enough bytes ⇒ there could be a full Header in the file but we error since less bytes than buffer are read
  • In practice, the JVM tries to read as many bytes as possible, so the case that enough bytes are present but fewer are read is rare

What does Math.round( (totalSizeBytes / (1024.0 * 1024.0)) * 1000.0 ) / 1000.0; do?

  • 1024 * 1024 = 1'048'576
    • This is the amount of bytes in a Mebibyte
      • 1024 = 2^10, so 1024 * 1024 = 2^20
    • Storage requirements in CS are often given as a multiple of two. The corresponding labels are “kibibyte”, “mebibyte”, “gibibyte” and so on
    • They are slightly larger than their decimal counterparts (kilobyte, megabyte, gigabyte…)
    • The correct abbreviation would be MiB (we write MB which is technically wrong)
  • The Math.round(x * 1000) / 1000 rounds to the last three decimals
    • In Java, there is no Math.round(x, 3) for example, so we have to shift by 3, round to nearest int, shift back by three (resulting in the number being rounded at the 3. decimal digit)

What does this code do?

String appendix = (deletedFiles > 0) ? ", " + deletedFiles + " files are deleted, defragment first" : "";
System.err.println("No more free slots " + appendix);
  • It is a ternary operator
  • Appendix is “x files are deleted, defragment first” when there are deleted files in the FS, appendix is empty, if there are no deleted files

Why is the data start offset 2112?

Header (64 Bytes) + 32 * 64 (64 Bytes per FileEntry) = 2112

Why is entryOffset a long here: long entryOffset = header.getFileTableOffset() + (long) i * FileEntry.FILE_ENTRY_SIZE; ?

  • Is an error; getFileTableOffset() always returns 64, i can at most be 31, FILE_ENTRY_SIZE is 64, so entryOffset is 2'048 at most
  • Technically, a short could be used here which goes from -32k to +32k (roughly)

What happens in rmfs if the file to be deleted is found?

  • Two things: the flag of the FileEntry is set to 1 (deleted) and the header is updated: deleted files count incremented, file count decremented
  • Neither the file entry nor the data chunk is deleted!
    • Happens only in defrag

What’s the difference between getfs and catfs ?

  • Getfs (write data chunk of entry to file) and catfs (print data chunk to console) both call catfs
  • Getfs passes a file channel as out while catfs passes Channels.newChannel(System.out)
    • Both, the file channel and the console channel implement WritableByteChannel

In copyRange , why is in a FileChannel and out a WritableByteChannel when a ReadablyByteChannel could be used for in ? (Why the asymmetry?)

  • In copyRange we want to read from in at an offset and write at position 0 in out
  • ReadablyByteChannel does not implement reading from an offset
  • So handling of this offset would be delegated to outside the copyRange method and make the code more complicated everywhere copyRange is called

Explain the date formatting at these two points in code (lsfs )

SimpleDateFormat simpleDateFormat = new SimpleDateFormat("MM/dd/yyyy' 'HH:mm:ss:S");
...
String dateString = simpleDateFormat.format(entry.getCreated() * 1000L);
  • The first line defines the format we want to print our date in
    • The single S at the end states milliseconds, non padded!
      • Non padded: 7ms ⇒ “7”, 70ms ⇒ “70”
      • Padded (via SSS): 7ms ⇒ “007”, 70ms ⇒ “070”
        • Is the usual pattern, gives fixed-length formats
  • created field in FileEntry is stored as a timestamp in seconds (Seconds passed since 1.1.1970), we even divide System.currentTimeMillis() by 1000 to get seconds when making a new FileEntry in addfs
    • This is why we have to multiply the timestamp by 1000 in the second line and results in our timestamps always ending in 000
    • This is a weird pattern to have:
      • When making the FileEntry: get milliseconds, truncate to seconds
      • When printing the FileEntry: get creation timestamp, multiply by 1000 to imply ms-precision when this precision was explicitly dropped

Explain this piece of code

List<FileEntry> alive = fileEntries.stream()
  .filter(e -> e.getFlag() == 0)
  .toList();
  • It creates a new list containing FileEntries which are alive. It uses the Java streams API for this
  • Streams are chained together, intermediate operations (filter in this example) return an altered stream on which another map, filter, reduce (and more) operation can be performed
  • Intermediate operations on streams do not alter the underlying data (fileEntries is the same before and after the call)

Explain how the FileChannel in defrag is used

  1. One FileChannel is instantiated for the whole defrag process
    • This channel doesn’t get closed nor reopened during the rewriting of file data chunks!
  2. The channel is passed in as both input and output into copyRange
    • Effectively reading from the file at one position and writing the contents to another position
  3. The position field of the channel is updated for each entry
    • This works because copyRange reads from an absolute position and writes to a relative position
      • Read: int bytesRead = in.read(buffer, readPos); → Absolute
      • Write: out.write(buffer); → Relative, uses channel.position
    • There are two ways a channel can be written/read:
      1. Relative: channel.read(buffer) → Takes channel.position field for reading/writing
      2. Absolute: channel.read(buffer, position) → Ignores channel.position

⇒ In essence, dfrgfs controls where to read from via the parameters start and length passed into copyRange and controls where to write to via channel.position

Explain oldNext, newOffset, alignedOffset, padding, finalSize in defrag

  1. oldNext: This is the end of the data block in the filesystem before defragging. This value is used in the end (difference to header.getNextFreeOffset() ) to compute the total cleaned bytes. Important: this value is not used anywhere else!
  2. newOffset: This is the start position for each entry not flagged for deletion. In the beginning, it is set to the start of the data block (position after last FileEntry). It grows larger as defrag progresses, incremented by the size of each block. Important: This position is potentially unaligned at the start of an iteration but gets aligned afterwards! (because it is used in conjunction with alignedOffset)
  3. alignedOffset: This is the 64-aligned version of newOffset. There is an aligned and unaligned version of newOffset in order to compute the required padding.
  4. padding: The difference between newOffset(initially unaligned) and alignedOffset. padding zero bytes are written at the end of the last data chunk to make it aligned. newOffset is set to alignedOffset after zero bytes have been written, making it aligned before writing the contents of the actual new block

All in all, the cycle looks like this:

  1. Check whether newOffset is 64-aligned
    • If not, write the required amount of zero bytes to get to the next multiple of 64, update newOffset to that aligned position → This pads the data block of the last written file entry!
  2. Write the file contents of the current entry (unaligned), increment newOffset by the number of bytes written
  3. Repeat the cycle

For clarification: alignedOffset and padding could be dropped, the whole loop could be written like this:

// DataStartOffset is always 64 aligned (is 2112)
newOffset = header.getDataStartOffset();

for (FileEntry e : aliveSorted) {
	// Always aligned
  long dest = newOffset;
  // Set write position
  channel.position(dest);
  copyRange(channel, e.getStart(), e.getLength(), channel);

  e.setStart(dest);

	// newOffset is potentially unaligned at the moment
  newOffset += e.getLength();
  // Aligned version of newOffset
  long nextAligned = align64(newOffset);

	// Fill the data block with zero bytes at the end of the iteration
  if (nextAligned > newOffset) {
    channel.position(newOffset);
    channel.write(ByteBuffer.allocate((int)(nextAligned - newOffset)));
  }
	
	// Align newOffset for next iteration after padding
  newOffset = nextAligned;
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment