// Patrick Louis

File hierarchy

Hierarchy (Transcript of the podcast)

Files, the predominant representation of everything on Unix, how are they scattered around?


  • What - The start and meanings
  • Who owns it now and when
  • Iterative makeup adjustments
  • So what - The Issues
  • Now what - Reviewing it
  • Conclusion

What - The start and meanings

Principle - What are files

The filesystem is represented as a rooted tree of directories where the root is denoted by the / character. When specifying that a file or directory is under another a / is also used as a separator. For example, /bin/init is a file under the root directory which is under the bin directory.
Everything is under that ‘/’, it doesn’t matter if it’s a hard disk, a network card, or your favorite game. They’re all somewhere under that root.

There are 3 main filetypes: ordinary files, directories, and special files such as devices and virtual files.
However, the unix directories don’t literally contain those files. Instead, they contain their names and pair them with a reference to their inode number which contains the file and its metadata. That explains why we can have links hard and soft.

Original Layout

So, with that in mind, what was the original layout or directory structure under the root in the original unix version. Let’s review the layout from the original UNIX Programmer’s manual hier(7) manpage. You can get it from the show notes.

  • /        root
  • /dev devices
  • /bin utility programs
  • /lib object libraries and other stuff
  • /etc essential data and dangerous maintenance utilities (passwd, group, motd, init, rc)
  • /tmp temporary files, usually on a fast device
  • /usr general-purpose directory, usually a mounted file system

This seems highly minimal compared what we currently have. Where does all those new directories come from, why were they added? Let’s first blame someone by digging into whose in control of the current standards.

Who owns it now and when

Amongst all the Unix standards out there, such as the single UNIX specification and the stuffs from the open group, the one that holds the hierarchy is the Filesystem Hierarchy Standard, FHS. It’s under the governance of the Linux Fondation.
Like all standards it’s meant as a guideline and it doesn’t have to be followed thoroughly, it’s just a reference. Until today there are only Linux distributions adhering to it, and thus the BSDs and other unix-like variants don’t necessarily preach its words. So what are they trying to achieve and how?

They want to predict the location of installed files and directory, to make it less chaotic, by specifying what every area of the system should contain, the principle behind them, and the mimimum files and directory requirements. Let’s unpack that idea of principle behind directories.

They wrote down standards for everyone one of the directory that should appear under the root. Another one of the distinction they put forward is the one between shareable, unshearable, static, and variable, directories.
Typically, everything under /var is of variable size, /usr and /opt are shareable and everything else is unshareable and static. Following that logic, /usr and /opt can be left out of a system and the remaining content should be enough to boot, restore, recover, and/or repair the system.

Now let’s jump into the real deal, what are they prescribing we should definitely have under the root directory and what are they mentioning that is special about those specific dirs. Let’s say here that instead of the original 6 directories there is now 14 directories, that’s 8 brand-new directories under the root.

Directory     Description
bin           Essential command binaries
              (They even have a list of must-have commands under that dir,
               such as cat, cp, date, dd, kill, ls, etc. The usual
               stuff that are correspondant with system calls)
boot          Static files of the boot loader
              ( Which was there before but not as a directory but as a
               file, you might have wondered where it was booting from
               in the original unix without that directory, so that
               explains it.)
dev           Device files
etc           Host-specific system configuration
lib           Essential shared libraries and kernel modules
media         Mount point for removable media
mnt           Mount point for mounting a filesystem temporarily
opt           Add-on application software packages
run           Data relevant to running processes
sbin          Essential system binaries
srv           Data for services provided by this system
tmp           Temporary files
usr           Secondary hierarchy
var           Variable data

Why did all those new directories appear? What the hek happened? Aren’t we supposed to be minimalists, where does that come from?

Iterative makeup adjustments

Where did all those new directories spurt out from? It’s all because of an infectious aging process that slowly started in the early unix days.

Back in the days

Remember the 6 directories under root in the original Unix. Well, there’s something that needs to be mentioned. If you noticed, there’s no /home so where did the home directory go? /usr, the general purpose directory was in fact where that directory was stored. It meant “user”, and not “unix system resources”.

This started because there was not enough size on the medium to store the directories that grew fast, such as the user directory. And thus /usr was mounted on a separate filesystem.
Early on, the biggest binaries where moved to that separate filesystem, both because they weren’t essential to boot the system and because they’ve run out of space on the first medium. That was when the /usr/bin, /usr/lib, and /usr/tmp where introduced. All of them non-essential for the minimal system but still needed by users. This notion of splitting across several places became inherent.

After Unix got a bit popular outside Bell Labs, people wanted to install their own utilities without infringing on the default installation. That means keeping their custom programs while they can update the whole system. And thus they introduced another split under /usr, the /usr/local, which contains a replica of the higher hierarchy /usr/local/bin, /usr/local/lib.

Yet again, another split happened when independent software vendors wanted to offer their packages. They didn’t want to interfere with users, and they didn’t want to mess up the whole install and thus places their packages under /usr/local/vendor, as another replica of the hierarchy, /usr/local/vendor/bin, /usr/local/vendor/lib.

That’s a lot of replicas of the same hierarchy in multiple places.

That’s about /usr but there are many other directories that appeared. Let’s talk about /var, where did this come from?

We talked about variable sized directories, right, and about problems with disk space. This is where I’m going at, the log directory, the spool, and tmp directories, were all under /usr before but because they grew in size they were moved to a brand-new directory under the root: /var so that it could be placed on another disk.
Size was a really big issue at the time, it brought more schism in the hierarchy. Having /usr on another disk and moving the binaries wasn’t creating enough space to store the programs.

Anyway, another thing with commercial vendors happened, they suddenly wanted to leave behind the /usr/local/vendor hierarchy and move it to /var/opt because it gave them the flexibility to offer packages that couldn’t be shared across multiple machines incidentally. Also, those packages were more or less optional, think of it like a local repository of proprietary packages.
Soon enough, vendors thought, insidiously, that it would be better to store all those packages under /opt instead. Bringing yet another directory under the root.

In the same manner, because of the new clutter created, new directories started to appear under root, for example the home directory, which was originally /usr and now is independant under root. The /usr had now become too crowded for what it was original designed for. Ironic, isn’t it?
The fancy names “U NIX s ource r epository” or “U NIX s ystem r esources” are all made-up names, and it’s too late to rename them anyway.
By the way, you can still install packages in your home directory, you just have to install them in ~/.local and put the settings in ~/.config or other standard local configuration directory.

More and more artificial justifications appeared and people are now following them. That’s one reason why the FHS exists, to formalize those nebulous ideas we have about those places on our system. Legacy mixed with pompous new creative and destructive ideas.
So if you ask yourself where you should install your stuffs or where you should look for when searching for specific things you better take a look at the FHS because you indeed need a standard to be able to make sense of this.

So what - The Issues

The world didn’t crumble, so what could be the issue? Well, we’ve been stacking problems, piling them up years after years, so it might someday eventually if we don’t manage the mess. There are many arguments on the current issues with the hierarchy. Let’s go over them.

One of them was already mentioned at the end of the previous section, the confusion that this iterative dichotomy has brought. The hierarchy has experienced a continuously increasing entropy on multiple scales. Starting from the hierarchy of the original Unix Seventh Edition everyone has put their hands on it. It’s hard to know where to place your executable because we are saturated with different valid contestants for this place. The confusion has another catalyst and that is legacy.

The history of /usr alone is mind boggling, it was first the user directory, then separated to another disk to host big stuffs, then the home directory was removed from it, and now it’s vaguely if even used as a directory hosted on another partition. And thus they had to reintroduce this concept by creating /usr/share, which would be the shared directory for multiple architecture. And again, this new directory under /usr is not commonly shared amongst machines, bringing more questions about its existence.

On the topic of questioning the current existence of some directories, many others are criticised. For example, the issues with size aren’t relevant as of today, and thus the distinction between /usr/bin and /bin and others is minimal. In so far as booting with a minimal rescue system is concerned, it doesn’t make sense today either, as we can boot from external medium to rescue a system.
Moreover, having multiple binaries in multiple places poses the issue of not knowing if you are invoking the right program at the right time, as there can be many copies of the executable under different places. It obviously also brings 3 obvious problems: simplicity, maintainability, and flexibility, which are at stakes.

Now what - Reviewing it

Ok, it’s not that great but are there alternatives. There are some distributions such as morpheus linux and sabotage linux that try to stay true to the original Unix spirit.

Morpheus Linux is maintained by the guys at suckless.org so it’s peer reviewed by persons that take that topic at heart.
Sabotage Linux takes the approach a bit differently by completely omitting the /usr and /sbin directories.
There’s also those that take the time to review the concept and reshape it, make it more reflective of what we need today, bring it to date. I could find two of those projects:

  • objectroot
  • GoboLinux

Each of them redefine the hierarchy of the filesystem while at the same time retaining backward compatibility, which sounds awesome.
GoboLinux has longer user-friendly names for the directories, for instance executables are all stored under /Programs and under /Programs there are sub-directories for the programs that stores them by versions.
One thing to notice with GoboLinux is that it only has 6 directories under the root, just like the original Seventh Edition. They are the following:

  • Depot
  • Mount
  • System
  • Files
  • Programs
  • Users

Also, it was created by the guy that wrote htop and LuaRocks, so cool.

objectroot is another approach.
While GoboLinux is a Linux distribution, objectroot is more of a new set of rules, easier to apply on current distributions. It has 5 directories under the root and they are:

  • /hosts — Operating system instances. [ More … ]
  • /org — Application and system software. [ More … ]
  • /users — Users’ home directories. [ More … ]
  • /boot — Second-stage bootloader. Optional.
  • /mnt — Temporarily mounted filesystems. Optional.

/boot and /mnt being optional, that makes only 3 essential directories.

That seems minimal, why is that? /users is pretty obvious to understand but what about the others.

/org contains shareable softwares between machines. /hosts brings the concept of distributed computing, it contains subdirectories with specific files for every machine, /hosts/self being the machine you are sitting at. Under that directory you find the typical stuffs.

You can know more about those different hierarchies by taking a look at their respective websites.

Conclusion

That’s about it. Now you’re enlightened about the dark history of the unix hierarchy of today. There’s not much you can do about it though. The least you could possibly do is to take a look at the FHS and follow the standards.

A good thing to mention here is that most unix variants have their own hier(7) manpage and they’re pretty similar, minus the little details. So stick to that. Also, don’t get the bright idea of splitting stuffs into a new directory under the root, just no, stop that. It just won’t make it easier… No, no, I said no, don’t do that, bad idea.

Will we ever learn?

Music

Soundtrack by kronstudio and it’s GEM. Checkout their website they have amazing tunes there.


References:

The man pages:

Attribution

  • Wenceslas Hollar or (more likely) Abraham Bosse. / Public domain



If you want to have a more in depth discussion I'm always available by email or irc. We can discuss and argue about what you like and dislike, about new ideas to consider, opinions, etc..
If you don't feel like "having a discussion" or are intimidated by emails then you can simply say something small in the comment sections below and/or share it with your friends.