My God, it's full of files

pythonic filesystem abstractions

Download as mp4.

Overview of different filesystem(-like) APIs in Python and attempts for unifying them

Pythonic filesystem abstractions: An overview of different filesystem(-like) APIs in Python and attempts for unifying them.

There's a lot of different filesystem(-like) APIs in Python. I intend to provide an overview of existing projects, their status and capabilities, and hopefully inspire you to work on improving things.

I intend to cover at least:

  • sync/async nature of access
  • path.py
  • twisted.vfs
  • twisted.filepath
  • FUSE
  • git (including my own project)
  • Allmydata Tahoe
  • CouchDB
  • SFTP, NFS, Plan 9

Me


So what's a filesystem

  • well, it has files
  • folders (or some such)
  • open, read/write, close
  • file handle, seek position
  • unlink, rename, mmap?

Filesystems

ext3, HFS+, VFAT, NTFS

NFS, CIFS

9P

Ceph, Allmydata Tahoe, Hadoop FS, ...

SFTP

S3

CouchDB

Venti + Fossil

git

FUSE

http://fuse.sourceforge.net/wiki/index.php/FUSE%20Python%20tutorial

GnomeVFS

KDE Input/Output (KIO)


APIs

Current Python API

Spread all over the place, built-ins and miscellaneous stdlib

  • file(), open(), file objects
  • os.listdir(), os.walk()
  • os.path.exists(), os.chdir()
  • os.unlink(), os.rename()
  • stat, mmap

... Considered Hurtful

Where the pain started

Twisted

async

network

other process

uncontrollable delays

Deferred

twisted.vfs

good idea, but don't use

(sorry Andy)

Conch ISFTPServer

limited-purpose reimplementation of part of twisted.vfs

Things learned

async doesn't look like sync

Deferred not going to stdlib?

♪♫ pig and elephant DNA
just won't splice ♪♫

Concentrating on sync

(for now)

(but don't forget async)

(network still important)


Why replace?

Lots of reasons

Quick list

More detail later

Here we go

Non-native filesystems

Mockability

Mock writing to /etc

Fault injection

See Petardfs for a generic FUSE fault injecting filesystem.

Not nice for unit tests, but maybe for system tests.

Security

No .. from user

twisted.python.filepath

t.p.filepath

top = FilePath('toplevel')
sub = top.child('foo')
sub.createDirectory()
p = sub.child('bar')
with p.open('w') as f:
	f.write('foo bar\n')

Security

Invisible dotfiles

Security

Virtual chroot

Security

Custom ACL

Nicer API

More features, where they exist

Transactions


Current API won't do

Global is bad

→ Accessible via single object

f = self.fs.open("myfile")

→ Mockability

implement needed part, KISS, croak on anything unwanted

→ Fault injection

wrap another implementation

→ No .. from user

p = self.fs.path("/my/safe/area")
p = p.child(user_input)

→ Invisible dotfiles

self.fs = NoDotfilesFS(self.fs)

→ Virtual chroot

self.fs = ChrootFS(
	fs=self.fs,
	root="/my/safe/area",
	)

→ Custom ACL

self.fs = AccessControlFS(
	fs=self.fs,
	acl=acl_rules,
	)

→ Nicer API

path.py?

The path.py website at http://www.jorendorff.com/articles/python/path fails to load unless you do it on a full moon, in front of a mirror, and reload three times.

In the end, I don't think path.py is a suitable base for this:

  1. It tries to be a string.

  2. It's cluttered; it even includes md5sum calculation and shutil calls. I think a base common API shouldn't include rmtree.

    It's probably a nice pragmatic helper, just not a good common API.

path.py

top = path('toplevel')
sub = top / 'foo'
sub.mkdir()
p = sub / 'bar'
with p.open('w') as f:
	f.write('foo bar\n')

→ Transactions

POSIX: atomic replace of single-file

could do atomic-write-on-close

→ Transactions

with git: arbitrary!

(sql-style redo on conflict)

with self.fs.transact() as t:
    t.path("foo").rename("foo.old")
    with t.path("foo").open("w") as f:
	    f.write("bar\n")

→ Can implement near-identical async API


Thank You

Questions? Opinions? Rants?

Find me during the conference or sprints to talk more.

Slides etc up on eagain.net