Saturday, January 5, 2008

PyGFS: implementing a distributed filesystem in python

In this post I try to explain how to implement a secure and robust distributed filesystem in user-space with python.

The advantages of user-space are many: no kernel modification, no OS crashes due to buggy code, debugging is easy, etc. Moreover, for the development point of view, in user-space it's possible to exploit all the nice features provided by the user-space libraries! It means that with few lines of code we can provide a lot of interesting features.

So, let's see some potential requirements for our filesystem:
  • the filesystem must support a complete set of standard POSIX APIs,
  • as a distributed filesystem it must provide data accessibility to remote hosts,
  • it must be reliable to hardware or network failures,
  • it must be secure (it must provide authentication, authorization and encryption mechanisms to provide secure access over insecure networks).
Even if the requirements seem to fit on a long-term project, it's possible to satisfy all of them with few lines of code. Let's see how.

The user-space accessibility is provided by FUSE, that allows to implement a full POSIX filesystem without any kernel changes (it provides all the required kernel APIs to register a filesystem without any kernel-space code). FUSE also allows to provide a secure method for non privileged users to mount their own filesystem.

A distributed filesystem also need a mechanism for communications (how to send data to the remote hosts). An interesting project that could help us for this is Pyro. Pyro allows to skip the development for a new networking communication protocol, since it provides an elegant and easy-to-use object oriented form of RPC. It also optionally supports x509 certificate encryption, that perfectly covers our security requirement.

At this point the real filesystem implementation is quite easy, we can use a simple client-server approach like NFS.

The client wraps all the POSIX syscalls in the filesystem defined by the FUSE interface and calls the equivalent OS routines on the remote server (using Pyro RPC); the server executes the OS procedues over the back-end filesystem and pass to the client the same result returned by the OS syscall (executed on the server filesystem).

Moreover, to provide reliability feautures it's possible to exploit the robust exception handling statements in python. In this way we can detect all the communication failures and call an opportune event handler to re-issue the operations when the server become reachable again. We can also increase the reliability using a client-side and a server-side file handles; in this way each file handle at the client-side can mapped to a different file handle at the server side. If the server goes down the mapping between the two file handles is simply re-initialized and this allows to transparently continue the operations on the clients as the server was never stopped.

So, I tried to implement a real example of this filesystem and I've called it PyGFS (it should be something like: python grid file system... in perspective I'd like to improve it with multiple servers to mirror or unify more filesystems in different hosts, just like a real grid-filesystem...). The source code is available to all who are interested on it... If you even have ideas to add new features let me know... ;-)


m1l3n said...

I am currently looking for simple solution for realtime replication between two hosts. This sound like is very experimental. Anyway I'll give it a try. Will update when have results.

Anonymous said...

Here's my attempt to make one (I call it PyDFS)

It's there on github too.