A Rant about Filesystems of Today

September 30th, 2006

If you grab the nearest Introduction to Computers book, and if that book were to mention the definition of a File System, you would find it claiming something like that a “File System is responsible for storing and organizing your files.” Sure, it stores the files and I am not going to discuss here whether they do that good enough or not, but what I am interested about is the orgnization claim. They do allow you organize, definitely, in those hierarchies, but is that good enough? In a day were each file has tons of information inside it, whether that file being text, an image, video or a complex vector animation. In a day where almost all popular file formats have rich metadata that describe the data they hold. In a day where it is no longer imagination to anticipate that sooner than later, you would be able to search through images that were analyzed and  classified by complex algorithms.  In this day, is it really good enough that all we have for organizing our files is one constraint: the name of the file and its location in some tree? I don’t think so.

It may be argued that I should not be ranting about File Systems, but rather, about the desktop environments. True, ResierFS never claimed to offer improved Usability, and hence, one may be inclined to think that it is all up to the desktop environments to offer a better interface for users to access their data.

However, researches in this field have not favoured one side of the camp. The Semantic File System by Gifford et al, one of the earliest approaches that attempted to improve the usability of file systems - which surprisingly, Gifford did not like in 1990 - merely augmented the underlying file system with a user-space daemon that indexes and extracts metadata from files, and an NFS server to provide users with a backward compatible interface. In his design, Gifford did not change how bits and bytes are written to whatever file system was used back then in 1990. Other projects went to the direction of extending the file systems with additional features that would allow better access and richer information to the stored data. In that context, SHORE actually stored typed objects on disk, rather than plain files. Others like BFS implemented metadata indexing within the context of the file system.

Hence comes the problem. First of all, in all I have read, I found no literature with a convincing argument of why an integrated solution would be better than an augmented one, or the other way around. Second, it is like these projects come and go. Many papers have been written, and prototypes produced in the past 15 years with the single goal of improving on the usability of the then-seen-limiting hierarchical file systems. And where are we now? Almost exactly as we were 15 years ago.

Almost, I say, almost. The one thing that recently got a push is what is now commonly known as Desktop Search. Most known are Google Desktop Search on Microsoft Windows, Spotlight on MacOSX and Beagle on GNU/Linux. However, I perceive desktop search as only more efficient programming; it is neither revolutionary, or even attempting to be.

Why is that you may ask? Let’s see. Why do you need to search? Because you cannot find something. But do you always search? No. You search when you cannot find something, or when you think that it may be difficult to find something. Searching, and according to many research findings, is more cognitively difficult than browsing. And although that perfect query with those two ANDs, an OR, and that sentence placed between double quotes could get you that difficult match in the first page of results, I would highly suspect your average user would be able to follow your suit.

Desktop Search tools are not “bad”. It is just that they are not enough. They sure would come handy if you are stuck and cannot find your file anywhere, but I do not see how they could become the primary mean of retrieving your files.

With the increase in the amount of storage available to users, average users will inevitably reach the point where they would not delete their files. It is already starting to be the case with many I know, those whose usage of computers is merely for word processing and e-mail. Then, what would be the case in a few years time when storage devices are in the units of Terabytes or perhaps more if some new technology broke to the streets. If you find it difficult now to find your way through hundreds or thousands of images, then how would it be 100 years from now when your grand-*-son has some million images of the family history.

And till when people would have to control the revisions of their files on their own. Again, this is a point that has been tackled by researchers but was never pushed for the masses: Version Controlled File Systems. Till when people would have to rename “CV.rtf” to “CV2.rtf” and with every few edits they would increment the name so that they would not lose a previous revision. Ah! And when they want to find their CV, they of course do not remember whether the right reversion was “CV6.rtf” or “CV7.rtf”.

Not only is the explosion in the amount of files is a huge issue, but also is the fact that users now maintain their files in separate storage entities, which could be in the same computer, in a neighboring computer, or in a computer in another continent than he is. Why would you always have to be aware of such dispersion? (If you are thinking NFS, wait, we will get to that later). Why do you have to maintain redundant copies, and why do you have to synchronize files between this and that computer, and between this computer and that portable device. Microsoft WinFS had some solutions for these issues, but they may not see the light anymore after they decided to dissect the 16-year old project into fragments.

What about Metadata? It is like they do not exist. Almost all of the most common file formats nowadays have Metadata that are left for the users to set. You know, this Author field in a PDF that is usually empty, and if not, probably has some cryptic irrelevant text. This is understandable of course, and users should not be blamed, because quite frankly my dear, Metadata means nothing. If they cannot organize based on it, or search with it, why would they spend the time to set it? And even more, why would they exercise the trouble in going through menus to edit them. Maybe, only a maybe, they would have set them if they were asked to in the Save dialogue of their application. But other than that, why spend the effort!?

While we are on the subject of Metadata, we need to discuss something relevant. Please, settle it down. Are you going to store Metadata in the file system or not!! I am simply not going to write an application that relies on storing Metadata on the disk if some file systems do not have such feature. The problem is not simple. Let us say that you made the perfect Storage/Organization solutiona and you wanted to allow users to add annotations to his documents. Now the GIF file format does not have room in its file format specification for such “Comments” data. One solution is to store this “extended” information on the file system. But if you were to store it in the file system, this means that moving the file to another system will move it without this extended information (you can make hacks, but nothing would be clean enough). It is a little heartbreaking that after we almost died in convincing the users to add semantics to his documents, we would then dumo away this information when he puts the file on his removable disk. This brings us back to the topic of augmented versus integrated semantic representation. We need to settle this issue. Will file systems go towards storage of bits only, or will they march towards additional territories like storing semantics.

Mind you, I know that most of the features I spoke of can be more or less glued to a Linux-based distribution. Someone could say, okay, get this library for searching, that library for version control, that on top of NFS to get an abstraction for dispersed storage devices, and… No. It is not that simple, because it never was a technical issue. If it were, we would have already been using all these features today.

I myself see it no big of a deal whether we implement a layer of semantics on top of the file system or inside the file system itself. Each solution has its advantages and disadvantages. What I care about is standardization. This is basically a usability issue. And it would go no one any good if searching was done by some software under KDE, by another under GNOME and by a third under Xfce. Maybe searching will not be that big of a deal, since as I said before, it never was the bigger issue. However, how would you like it if you have to organize your files in KDE based on Tags you assign and in GNOME based on the Metadata within the files!?

Again, if it was only a technical issue we would have already been semantic’ing around. The hierarchical organization scheme was molded around the traditional means of file cabinet organization. In other words, users can relate to it. Organizing your “files” into “folders” does not require a steep learning curve, at least compared to a semantic-based organization. Heck, I still find it difficult to explain to even the technically-inclined what semantic storage is!!

It is all about usability.

Gifford, in his paper titled Semantic File System, implemented what he called Virtual Folders. These folders were, contrary to the folders of the traditional file system, with no predefined content. Each Virtual Folder had associated with it a query, and when you would open that folder, that query would be executed and the results would then be put into that folder, at this instance of time. If you open that same folder one hour later and you had created a new file that matched that very same query, then when opening that Virtual Folder again you would find the new file in it, without you having to specifically put it in the Virtual Folder.

Virtual Folders could be now seen in Apple’s Spotlight, where they are called Smart Folders.

So, is this really what the user would all need!? Let us picture a little scenario. Let us say you frequently edit family videos, and thus, you created a Virtual Folder (or Smart Folder, or whatever the name is) called “Home Videos”, and you set its query to be “type:avi tag:toedit”. Now, since you really are not that geeky in video compression, a friend of yours once gave you the right settings to use when encoding your files, settings like the Bit Rate for example. You put his instructions into a file and you named it “encoding_settings.txt”. That file would of course not appear in the Virtual Folder; it is neither an AVI file nor it is assigned a tag “toedit”. The problem is, you want it to.

One easy solution to this problem is to just edit the query to something like “type:avi tag:toedit or name:encoding_settings.txt”. That would work, assuming there are no duplicates - that file your friend helped you with would appear right in your Virtual Folder, but is that the way it should be?

Well, there are other ways. One of them is to make Exclude and Include Lists for Virtual Folders. In an Include List, you would specify one file or more that must be included in the results. Whatever the result was, this file would appear in them if it was included. In other words, more fine-grained results at the cost of added complexity on the user.

Now that you have come to this point, you should have grasped what GLScube is about. If you did not, please allow me. GLScube is about providing users with means to organize and retrieve their files in the easiest possible way. It is about how to allow users to better store their data, how they can better access their data. GLScube is an attempt to provide an extensible standard way for organizing and retrieving your files.

You should realize that many are not yet in dead-need for semantic-based organization. But in few years, they certainly will. The constrains of performance of 15 years ago, which are less relevant now, will become easily ignorable in a few years. The technicalities are not rocket science too. The main issue is how to find a middle ground for usability, and then, how to convince users with the new idea of semantic storage and encourage them to move to it.

This rant is merely a short description of why GLScube was born. Unfortunately, with this driving force, we were unaware that the challenges would not be how to sort out technical issues, but how to solve usability issues. Tag-based organization is certainly fun, but is it the best there is?

I will next be writing more details on the technical issues, and the crazy choices we were faced with in terms of usability.

Entry Filed under: General

5 Comments Add your own

  • 1. TerminalDigit  |  October 9th, 2006 at 6:52 am

    Any idea when the LiveCD will be ready? Been wanting to try this out since it hit Digg in the beginnging of July.

  • 2. The Hundredth Monkey Phen&hellip  |  October 9th, 2006 at 5:47 pm

    […] Sean pointed me to the GLScube project a couple months ago. They are four Egyptians putting together a semantic file system for Linux. They have recently made the decision to rewrite their system following a 0.1 version release, and have also started a development blog, which should be interesting to follow. Their latest post, A Rant About Filesystems of Today, does a great job explaining the potential benefit of metadata for the average computer user. As they say, it’s all about usability. What about Metadata? It is like they do not exist. Almost all of the most common file formats nowadays have Metadata that are left for the users to set. You know, this Author field in a PDF that is usually empty, and if not, probably has some cryptic irrelevant text. This is understandable of course, and users should not be blamed, because quite frankly my dear, Metadata means nothing. If they cannot organize based on it, or search with it, why would they spend the time to set it? And even more, why would they exercise the trouble in going through menus to edit them. Maybe, only a maybe, they would have set them if they were asked to in the Save dialogue of their application. But other than that, why spend the effort!? del.icio.us this! […]

  • 3. Pgan  |  October 12th, 2006 at 6:46 am

    First, congratulations. The videos look very impressive. I am having some issues compiling, so I cannot test GLSCube yet.

    www.GLSCube.org says that you can define relationships between documents (or objects). May this allow one day to search for something like “the phone numbers of all people who live in Alexandria and with whom I have corresponded in the last year”? This was touted as the mainstay of WinFS.

  • 4. amr.ramadan  |  October 16th, 2006 at 3:59 pm

    TerminalDigit:

    We were recently quite busy, and thus the delay in the Live CD. We will put trying to put that together soon though, or at least a detailed installation instructions.

    Pgan:

    The search query you suggested needs a Natural Language processing engine. This is a field of its own and would anyway be best built on top of a relational (or SQL) search engine. But certainly, this would be the way things should go.

  • 5. tramadol&hellip  |  May 10th, 2007 at 7:18 pm

    tramadol…

    https://tramadol-lowest.blogspot.com/…

Leave a Comment

You must be logged in to post a comment.

Trackback this post  |  Subscribe to the comments via RSS Feed


Calendar

March 2010
M T W T F S S
« Oct    
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31  

Most Recent Posts