File System Access

Summary

There are a few anti-patterns to consider when accessing the file system. This article assumes knowledge of the standard GFile, GInputStream and GOutputStream APIs.

Asynchronous I/O

Almost all I/O should be performed asynchronously. That is, without blocking the GLib main context. This can be achieved by always using the *_async() and *_finish() variants of each I/O function.

Synchronous I/O blocks the main loop, which means that other events, such as user input, incoming networking packets, timeouts and idle callbacks, are not handled until the blocking function returns.

Synchronous I/O is acceptable in certain circumstances where the overheads of scheduling an asynchronous operation exceed the costs of local synchronous I/O on Linux. For example, making a small read from a local file, or from a virtual file system such as /proc. For such reads, the low level functions g_open(), read() and g_close() should be used rather than GIO.

Files in the user’s home directory do not count as local, as they could be on a networked file system.

Note that the alternative – running synchronous I/O in a separate thread – is highly discouraged; see the threading guidelines for more information.

File Path Construction

File names and paths are not normal strings: on some systems, they can use a character encoding other than UTF-8, while normal strings in GLib are guaranteed to always use UTF-8. For this reason, special functions should be used to build and handle file names and paths. (Modern Linux systems almost universally use UTF-8 for filename encoding, so this is not an issue in practice, but the file path functions should still be used for compatibility with systems such as Windows, which use UTF-16 filenames.)

For example, file paths should be built using g_build_filename() rather than g_strconcat().

Doing so makes it clearer what the code is meant to do, and also eliminates duplicate directory separators, so the returned path is canonical (though not necessarily absolute).

As another example, paths should be disassembled using g_path_get_basename() and g_path_get_dirname() rather than g_strrstr() and other manual searching functions.

Path Validation and Sandboxing

If a filename or path comes from external input, such as a web page or user input, it should be validated to ensure that putting it into a file path will not produce an arbitrary path. For example if a filename is constructed from the constant string ~/ plus some user input, if the user inputs ../../etc/passwd, they can (potentially) gain access to sensitive account information, depending on which user the program is running as, and what it does with data loaded from the constructed path.

This can be avoided by validating constructed paths before using them, using g_file_resolve_relative_path() to convert any relative paths to absolute ones, and then validating that the path is beneath a given root sandboxing directory appropriate for the operation. For example, if code downloads a file, it could validate that all paths are beneath ~/Downloads, using g_file_has_parent().

As a second line of defense, all projects which access the file system should consider providing a mandatory access control profile, using a system such as AppArmor or SELinux, which limits the directories and files they can read from and write to.