My God, it's full of files

pythonic filesystem abstractions

Overview of different filesystem(-like) APIs in Python and attempts for unifying them

Pythonic filesystem abstractions: An overview of different filesystem(-like) APIs in Python and attempts for unifying them.

There's a lot of different filesystem(-like) APIs in Python. I intend to provide an overview of existing projects, their status and capabilities, and hopefully inspire you to work on improving things.

I intend to cover at least:

Me

So what's a filesystem

Filesystems

ext3, HFS+, VFAT, NTFS

NFS, CIFS

9P

Ceph, Allmydata Tahoe, Hadoop FS, ...

SFTP

S3

CouchDB

Venti + Fossil

git

FUSE

http://fuse.sourceforge.net/wiki/index.php/FUSE%20Python%20tutorial

GnomeVFS

KDE Input/Output (KIO)

APIs

Current Python API

Spread all over the place, built-ins and miscellaneous stdlib

... Considered Hurtful

Where the pain started

Twisted

async

network

other process

uncontrollable delays

Deferred

twisted.vfs

good idea, but don't use

(sorry Andy)

Conch ISFTPServer

limited-purpose reimplementation of part of twisted.vfs

Things learned

async doesn't look like sync

Deferred not going to stdlib?

♪♫ pig and elephant DNA
just won't splice ♪♫

Concentrating on sync

(for now) (but don't forget async)

(network still important)

Why replace?

Lots of reasons

Quick list

More detail later

Here we go

Non-native filesystems

Mockability

Mock writing to /etc

Fault injection

See Petardfs for a generic FUSE fault injecting filesystem.

Not nice for unit tests, but maybe for system tests.

Security

No .. from user

twisted.python.filepath

t.p.filepath

top = FilePath('toplevel')
sub = top.child('foo')
sub.createDirectory()
p = sub.child('bar')
with p.open('w') as f:
    f.write('foo bar\n')

Security

Invisible dotfiles

Security

Virtual chroot

Security

Custom ACL

Nicer API

More features, where they exist

Transactions

Current API won't do

Global is bad

→ Accessible via single object

f = self.fs.open("myfile")

→ Mockability

implement needed part, KISS, croak on anything unwanted

→ Fault injection

wrap another implementation

→ No .. from user

p = self.fs.path("/my/safe/area")
p = p.child(user_input)

→ Invisible dotfiles

self.fs = NoDotfilesFS(self.fs)

→ Virtual chroot

self.fs = ChrootFS(
    fs=self.fs,
    root="/my/safe/area",
    )

→ Custom ACL

self.fs = AccessControlFS(
    fs=self.fs,
    acl=acl_rules,
    )

→ Nicer API

path.py?

The path.py website at http://www.jorendorff.com/articles/python/path fails to load unless you do it on a full moon, in front of a mirror, and reload three times.

In the end, I don't think path.py is a suitable base for this:

  1. It tries to be a string.
  2. It's cluttered; it even includes md5sum calculation and shutil calls. I think a base common API shouldn't include rmtree.

It's probably a nice pragmatic helper, just not a good common API.

path.py

top = path('toplevel')
sub = top / 'foo'
sub.mkdir()
p = sub / 'bar'
with p.open('w') as f:
    f.write('foo bar\n')

→ Transactions

POSIX: atomic replace of single-file

could do atomic-write-on-close

→ Transactions

with git: arbitrary!
(sql-style redo on conflict)
with self.fs.transact() as t:
    t.path("foo").rename("foo.old")
    with t.path("foo").open("w") as f:
        f.write("bar\n")

→ Can implement near-identical async API

Thank You

Questions? Opinions? Rants?

Find me during the conference or sprints to talk more.

Slides etc up on eagain.net