My God, it's full of files
pythonic filesystem abstractions
Download as mp4.
Overview of different filesystem(-like) APIs in Python and attempts for unifying them
Pythonic filesystem abstractions: An overview of different filesystem(-like) APIs in Python and attempts for unifying them.
There's a lot of different filesystem(-like) APIs in Python. I intend to provide an overview of existing projects, their status and capabilities, and hopefully inspire you to work on improving things.
I intend to cover at least:
- sync/async nature of access
- git (including my own project)
- Allmydata Tahoe
- SFTP, NFS, Plan 9
So what's a filesystem
- well, it has files
- folders (or some such)
- open, read/write, close
- file handle, seek position
- unlink, rename, mmap?
Ceph, Allmydata Tahoe, Hadoop FS, ...
Venti + Fossil
KDE Input/Output (KIO)
Current Python API
Spread all over the place, built-ins and miscellaneous stdlib
... Considered Hurtful
Where the pain started
good idea, but don't use
limited-purpose reimplementation of part of twisted.vfs
async doesn't look like sync
Deferred not going to stdlib?
♪♫ pig and elephant DNA
just won't splice ♪♫
Concentrating on sync
(but don't forget async)
(network still important)
Lots of reasons
More detail later
Here we go
Mock writing to
See Petardfs for a generic FUSE fault injecting filesystem.
Not nice for unit tests, but maybe for system tests.
.. from user
top = FilePath('toplevel') sub = top.child('foo') sub.createDirectory() p = sub.child('bar') with p.open('w') as f: f.write('foo bar\n')
More features, where they exist
Current API won't do
Global is bad
→ Accessible via single object
f = self.fs.open("myfile")
implement needed part, KISS, croak on anything unwanted
→ Fault injection
wrap another implementation
.. from user
p = self.fs.path("/my/safe/area") p = p.child(user_input)
→ Invisible dotfiles
self.fs = NoDotfilesFS(self.fs)
self.fs = ChrootFS( fs=self.fs, root="/my/safe/area", )
→ Custom ACL
self.fs = AccessControlFS( fs=self.fs, acl=acl_rules, )
→ Nicer API
path.py website at
fails to load unless
you do it on a full moon, in front of a mirror, and reload three
In the end, I don't think
path.py is a suitable base for this:
It tries to be a string.
It's cluttered; it even includes md5sum calculation and shutil calls. I think a base common API shouldn't include
It's probably a nice pragmatic helper, just not a good common API.
top = path('toplevel') sub = top / 'foo' sub.mkdir() p = sub / 'bar' with p.open('w') as f: f.write('foo bar\n')
POSIX: atomic replace of single-file
could do atomic-write-on-close
(sql-style redo on conflict)
with self.fs.transact() as t: t.path("foo").rename("foo.old") with t.path("foo").open("w") as f: f.write("bar\n")
→ Can implement near-identical async API
Questions? Opinions? Rants?
Find me during the conference or sprints to talk more.
Slides etc up on eagain.net