Disk Seeks Considered Harmful

Disk seeks are one of the most expensive operations you can possibly perform. You might not know this from looking at how many of them we perform, but trust me, they are. Consequently, please refrain from the following suboptimal behavior:

  1. Placing lots of small files all over the disk.

  2. Opening, stating, and reading lots of files all over the disk

  3. Doing the above on files that are laid out at different times, so as to ensure that they are fragmented and cause even more seeking.

  4. Doing the above on files that are in different directories, so as to ensure that they are in different cylinder groups and cause even more seeking.

  5. Repeatedly doing the above when it only needs to be done once.

Ways in which you can optimize your code to be seek-friendly:

  1. Consolidate data into a single file.

  2. Keep data together in the same directory.

  3. Cache data so as to not need to reread constantly.

  4. Share data so as not to have to reread it from disk when each application loads.

  5. Consider caching all of the data in a single binary file that is properly aligned and can be mmapped.

The trouble with disk seeks are compounded for reads, which is unfortunately what we are doing. Remember, reads are generally synchronous while writes are asynchronous. This only compounds the problem, serializing each read, and contributing to program latency.