My God, it's full of files

pythonic filesystem abstractions

Overview of different filesystem(-like) APIs in Python and attempts for unifying them

Pythonic filesystem abstractions: An overview of different filesystem(-like) APIs in Python and attempts for unifying them.

There's a lot of different filesystem(-like) APIs in Python. I intend to provide an overview of existing projects, their status and capabilities, and hopefully inspire you to work on improving things.

I intend to cover at least:

  • sync/async nature of access
  • path.py
  • twisted.vfs
  • twisted.filepath
  • FUSE
  • git (including my own project)
  • Allmydata Tahoe
  • CouchDB
  • SFTP, NFS, Plan 9

Me

So what's a filesystem

  • well, it has files
  • folders (or some such)
  • open, read/write, close
  • file handle, seek position
  • unlink, rename, mmap?

9P

S3

git

Current Python API

Spread all over the place, built-ins and miscellaneous stdlib

  • file(), open(), file objects
  • os.listdir(), os.walk()
  • os.path.exists(), os.chdir()
  • os.unlink(), os.rename()
  • stat, mmap

twisted.vfs

good idea, but don't use

(sorry Andy)

Conch ISFTPServer

limited-purpose reimplementation of part of twisted.vfs

Things learned

async doesn't look like sync

Deferred not going to stdlib?

♪♫ pig and elephant DNA
just won't splice ♪♫

Concentrating on sync

(for now) (but don't forget async)

(network still important)

Mockability

Mock writing to /etc

Fault injection

See Petardfs for a generic FUSE fault injecting filesystem.

Not nice for unit tests, but maybe for system tests.

Security

No .. from user

twisted.python.filepath

t.p.filepath

top = FilePath('toplevel')
sub = top.child('foo')
sub.createDirectory()
p = sub.child('bar')
with p.open('w') as f:
    f.write('foo bar\n')

Security

Invisible dotfiles

Security

Virtual chroot

Security

Custom ACL

→ Accessible via single object

f = self.fs.open("myfile")

→ Mockability

implement needed part, KISS, croak on anything unwanted

→ Fault injection

wrap another implementation

→ No .. from user

p = self.fs.path("/my/safe/area")
p = p.child(user_input)

→ Invisible dotfiles

self.fs = NoDotfilesFS(self.fs)

→ Virtual chroot

self.fs = ChrootFS(
    fs=self.fs,
    root="/my/safe/area",
    )

→ Custom ACL

self.fs = AccessControlFS(
    fs=self.fs,
    acl=acl_rules,
    )

→ Nicer API

path.py?

The path.py website at http://www.jorendorff.com/articles/python/path fails to load unless you do it on a full moon, in front of a mirror, and reload three times.

In the end, I don't think path.py is a suitable base for this:

  1. It tries to be a string.
  2. It's cluttered; it even includes md5sum calculation and shutil calls. I think a base common API shouldn't include rmtree.

It's probably a nice pragmatic helper, just not a good common API.

path.py

top = path('toplevel')
sub = top / 'foo'
sub.mkdir()
p = sub / 'bar'
with p.open('w') as f:
    f.write('foo bar\n')

→ Transactions

POSIX: atomic replace of single-file

could do atomic-write-on-close

→ Transactions

with git: arbitrary!
(sql-style redo on conflict)
with self.fs.transact() as t:
    t.path("foo").rename("foo.old")
    with t.path("foo").open("w") as f:
        f.write("bar\n")

Thank You

Questions? Opinions? Rants?

Find me during the conference or sprints to talk more.

Slides etc up on eagain.net