My God, it's full of files
pythonic filesystem abstractions
Download as mp4.
Overview of different filesystem(-like) APIs in Python and attempts for unifying them
Pythonic filesystem abstractions: An overview of different filesystem(-like) APIs in Python and attempts for unifying them.
There's a lot of different filesystem(-like) APIs in Python. I intend to provide an overview of existing projects, their status and capabilities, and hopefully inspire you to work on improving things.
I intend to cover at least:
- sync/async nature of access
- path.py
- twisted.vfs
- twisted.filepath
- FUSE
- git (including my own project)
- Allmydata Tahoe
- CouchDB
- SFTP, NFS, Plan 9
Me
So what's a filesystem
- well, it has files
- folders (or some such)
- open, read/write, close
- file handle, seek position
- unlink, rename, mmap?
Filesystems
ext3
, HFS+
, VFAT
, NTFS
NFS
, CIFS
9P
Ceph, Allmydata Tahoe, Hadoop FS, ...
SFTP
S3
CouchDB
Venti + Fossil
git
FUSE
http://fuse.sourceforge.net/wiki/index.php/FUSE%20Python%20tutorial
GnomeVFS
KDE Input/Output (KIO)
APIs
Current Python API
Spread all over the place, built-ins and miscellaneous stdlib
file()
,open()
,file
objectsos.listdir()
,os.walk()
os.path.exists()
,os.chdir()
os.unlink()
,os.rename()
stat
,mmap
... Considered Hurtful
Where the pain started
Twisted
async
network
other process
uncontrollable delays
Deferred
twisted.vfs
good idea, but don't use
(sorry Andy)
Conch ISFTPServer
limited-purpose reimplementation of part of twisted.vfs
Things learned
async doesn't look like sync
Deferred not going to stdlib?
♪♫ pig and elephant DNA
just won't splice ♪♫
Concentrating on sync
(for now)
(but don't forget async)
(network still important)
Why replace?
Lots of reasons
Quick list
More detail later
Here we go
Non-native filesystems
Mockability
Mock writing to /etc
Fault injection
See Petardfs for a generic FUSE fault injecting filesystem.
Not nice for unit tests, but maybe for system tests.
Security
No ..
from user
twisted.python.filepath
t.p.filepath
top = FilePath('toplevel')
sub = top.child('foo')
sub.createDirectory()
p = sub.child('bar')
with p.open('w') as f:
f.write('foo bar\n')
Security
Invisible dotfiles
Security
Virtual chroot
Security
Custom ACL
Nicer API
More features, where they exist
Transactions
Current API won't do
Global is bad
→ Accessible via single object
f = self.fs.open("myfile")
→ Mockability
implement needed part, KISS, croak on anything unwanted
→ Fault injection
wrap another implementation
→ No ..
from user
p = self.fs.path("/my/safe/area")
p = p.child(user_input)
→ Invisible dotfiles
self.fs = NoDotfilesFS(self.fs)
→ Virtual chroot
self.fs = ChrootFS(
fs=self.fs,
root="/my/safe/area",
)
→ Custom ACL
self.fs = AccessControlFS(
fs=self.fs,
acl=acl_rules,
)
→ Nicer API
path.py
?
The path.py
website at
http://www.jorendorff.com/articles/python/path
fails to load unless
you do it on a full moon, in front of a mirror, and reload three
times.
In the end, I don't think path.py
is a suitable base for this:
-
It tries to be a string.
-
It's cluttered; it even includes md5sum calculation and shutil calls. I think a base common API shouldn't include
rmtree
.It's probably a nice pragmatic helper, just not a good common API.
path.py
top = path('toplevel')
sub = top / 'foo'
sub.mkdir()
p = sub / 'bar'
with p.open('w') as f:
f.write('foo bar\n')
→ Transactions
POSIX: atomic replace of single-file
could do atomic-write-on-close
→ Transactions
with git
: arbitrary!
(sql-style redo on conflict)
with self.fs.transact() as t:
t.path("foo").rename("foo.old")
with t.path("foo").open("w") as f:
f.write("bar\n")
→ Can implement near-identical async API
Thank You
Questions? Opinions? Rants?
Find me during the conference or sprints to talk more.
Slides etc up on eagain.net