Review of Foundations of Python Network Programming
A review of the book Foundations of Python Network Programming by John Goerzen.
I've known John Goerzen from Debian and his OfflineIMAP project long before this book came out. I'm probably biased to think of him as a talented person. I also believe I played a role in introducing him to Twisted.
I'm also one of the people working on the Twisted framework, and would probably say good things about any book that mentioned Twisted.
I would also like to state that I am not in the real target audience
for this book -- I've done my fair share of kernelspace networking C,
I've read my fair share of TCP RFCs, and written
poll loops for a
The goal I set for myself was not to write a glowing fluffy review, give a link to amazon and increase sales -- even if I was bribed with a shiny book -- but to really dig into this book, see how good it is. Blame it on me being an average pessimistic Finn, or something.
The book looks neat and tidy, the cover is colorful and abstract but at the same time professional looking. The content layout is spacious and clear, sometimes almost spartan.
The first thing that really caught my eye was fonts. The book uses
quite distinct fonts -- comparing it to an O'Reilly book I had at
hand, I can honestly say I think the body text is more readable in
Foundations. The font used in headings and table of contents is a
lot less readable for me -- especially the letter Q and comma. The Q
looks like an O with dirt under it (not even touching the letter
itself), so what I first read from the table of contents was
Python, as in So-Out-of-Luck.
The first hundred pages -- one fifth of the total book -- are a good, quick introduction to networking basics such as TCP, UDP and DNS (not the protocol, but the query API and things like hostname spoof protection). The last twenty pages of that rush in brief explanations of many a bit more obscure networking things, such as half-open sockets, string termination, network byte order, broadcast and so on.
After a brief 50 page trip through XML and XML-RPC, the book starts to lose its tutorial style and become more useful also as a reference. In 30 pages, parsing and generating email messages is explained very clearly. SMTP and POP both their approximate 10 pages, which seems adequate for covering everything important.
The author's experience with IMAP shows in the next 50 pages. I think atleast these 50 pages are worth purchasing. This is the part of this book I myself was personally most interested in, and I am not disappointed.
A good 20 pages is spent on FTP. Personally, I would have rather seen more on HTTP. FTP is dead to me, and I hope it understands to stop moving soon. Database access gets another 20 pages, and so does SSL.
Different server-side programming frameworks are discussed next. At 10-20 pages each, the author takes us through a whirlwind tour of SocketServer, SimpleXMLRPCServer, CGI, and mod_python. I have no clue why Twisted isn't on this list.
The rest of the book is about different ways to achieve the goal of serving multiple clients at once. Whereas in the earlier chapters forking and threading were things a library did for you, now you get to understand what was happening under the hood.
Separate processes, threads and asynchronous event loops are each
discussed in roughly 25 pages, with the last one introducing Twisted
briefly by reimplementing an earlier example with Twisted. Of course,
practically everything in the IMAP chapter was specifically about
Twisted's IMAP library, so concepts such as
Deferred were already
The Python in the Book
I think that in any book that is so related to a programming language, the examples included matter a lot for the readability of the whole. On the whole, the examples were readable, kept as light as they possibly can be while still covering the subject area well enough. However, there were a few things that kept distracting me.
The python examples included at least one repeating pattern I consider non-pythonic. It is small, but it did keep annoying me through the book.
# Part of `gopherclient.py` from Chapter 1, page 10. while 1: buf = s.recv(2048) if not len(buf): break sys.stdout.write(buf)
Why would one use an explicit
len() call in the above? The string
s.recv() can be directly tested for emptiness, by just
if not buf: ...
Another thing I noted was the heavy use of global variables in some of the examples. Most of the uses seemed like they would be more readable and pythonic if there would be an object explicitly storing the state. (e.g. page 106)
There's a lot of
var == None comparisons in the book, I prefer
var is None, and consider that more pythonic.
One thing that also caught my eye was using
The requested document
'%s' was not found in a string, instead of the better-behaving
...document %r was... (page 345).
The book contains multiple references to using
pickle in a network
protocol. Sometimes, but definitely not always, the author remembers
to warn of security concerns. Using
pickle for anything not
directly controlled by the process itself is so horribly bad I feel
the book should have never even mentioned the possibility. If the
author wanted to have a non-XML-RPC example in a more DIY sense, he
could have pointed to e.g.
twisted.spread.jelly. (e.g. pages 159,
Whereas the book often explicitly cautions not to try to e.g. store files in memory, this thinking is not practised thoroughly. Here's part of an example that gathers a list of message UIDs in memory while it fetches the messages via IMAP, and a small refactoring of it that trades memory use for increased protocol traffic -- other tradeoffs, such as gathering 100 UIDs before sending a command, are equally possible.
# Part of `tdownload-and-delete.py` from Chapter 12, page 252. def handleuids(self, uids): self.uidlist = MessageSet() dlist =  destfd = open(sys.argv, "at") for num, data in uids.items(): uid = data['UID'] d = self.proto.fetchSpecific(uid, uid = 1, peek = 1) d.addCallback(self.gotmessage, destfd, uid) dlist.append(d) dl = defer.DeferredList(dlist) dl.addCallback(lambda x, fd: fd.close(), destfd) return dl def gotmessage(self, data, destfd, uid): print "Received message UID", uid for key, value in data.items(): print "Writing message", key i = value.index('BODY') + 2 msg = email.message_from_string(value[i]) destfd.write(msg.as_string(unixfrom = 1)) destfd.write("\n") self.uidlist.add(int(uid)) def deletemessages(self, data = None): print "Deleting messages", str(self.uidlist) d = self.proto.addFlags(str(self.uidlist), ["\\Deleted"], uid = 1) d.addCallback(lambda x: self.proto.expunge()) return d
# My version of `tdownload-and-delete.py` def handleuids(self, uids): dlist =  destfd = open(sys.argv, "at") for num, data in uids.items(): uid = data['UID'] d = self.proto.fetchSpecific(uid, uid = 1, peek = 1) d.addCallback(self.gotmessage, destfd, uid) dlist.append(d) dl = defer.DeferredList(dlist) dl.addCallback(lambda x, fd: fd.close(), destfd) return dl def gotmessage(self, data, destfd, uid): print "Received message UID", uid for key, value in data.items(): print "Writing message", key i = value.index('BODY') + 2 msg = email.message_from_string(value[i]) destfd.write(msg.as_string(unixfrom = 1)) destfd.write("\n") d = self.proto.addFlags(uid, ["\\Deleted"], uid = 1) return d def deletemessages(self, data = None): print "Deleting messages" d = self.proto.expunge() return d
I was also a bit disappointed by the lack of any visual helpers in the layout of the example code. I would have appreciated moderate use of bold, italics and so on. I also found I had more trouble following the indentation than usual -- maybe a containing box around the code would have given the eye something to fixate on. The left margin is so far away, trying to quickly discern relative indentations of blocks of code gets tiring quite fast.
The examples are all available online, which is a great help -- while reading a book is usually more comfortable, and most of all more flexible, than reading on a computer, for code I found I vastly prefer the colorized output and search functionalities of my emacs.
The Twisted in the Book
The Twisted code in the book is not of as good quality as the normal
python code. There are mishandled
Deferreds, race conditions and
whatnot. Let me try to illustrate by showing you the first example
from the chapter discussing Twisted:
#!/usr/bin/env python # `tconn.py` from Chapter 12, page 226. # Basic connection with Twisted - Chapter 12 - tconn.py # Note: This example assumes you have Twisted 1.1.0 or above installed. from twisted.internet import defer, reactor, protocol from twisted.protocols.imap4 import IMAP4Client import sys class IMAPClient(IMAP4Client): def connectionMade(self): print "I have successfully connected to the server!" d = self.getCapabilities() d.addCallback(self.gotcapabilities) def gotcapabilities(self, caps): if caps == None: print "Server did not return a capability list." else: for key, value in caps.items(): print "%s: %s" % (key, str(value)) # This is the last thing, so stop the reactor. self.logout() reactor.stop() class IMAPFactory(protocol.ClientFactory): protocol = IMAPClient def clientConnectionFailed(self, connector, reason): print "Client connection failed:", reason reactor.stop() reactor.connectTCP(sys.argv, 143, IMAPFactory()) reactor.run()
That's a reasonably small program, but nonetheless, alarms bells are
ringing in my head: there are no errbacks anywhere, and the reactor is
only stopped on a successful code path. And
return value, which is a
Deferred, is thrown away and the reactor
stopped immediately. There's no guarantee the logout message even got
as far as the socket, the program just shuts down.
Let's fix that right now! And while we're at it, let's make it a bit more Twisted:
#!/usr/bin/env python # `my-tconn.py` # Basic connection with Twisted - Chapter 12 - tconn.py # Note: This example assumes you have Twisted 1.1.0 or above installed. from twisted.internet import defer, reactor, protocol, error from twisted.protocols.imap4 import IMAP4Client import sys class IMAPGetCapabilities(IMAP4Client): def _doLogout(self, r): d = self.logout() d.addCallback(lambda _: r) return d def connectionMade(self): d = self.getCapabilities() d.addBoth(self._doLogout) d.chainDeferred(self.factory.deferred) class IMAPGetCapabilitiesFactory(protocol.ClientFactory): protocol = IMAPGetCapabilities def __init__(self): self.deferred = defer.Deferred() def clientConnectionFailed(self, connector, reason): self.deferred.errback(reason) def clientConnectionLost(self, connector, reason): if reason.check(error.ConnectionDone): # only an error if premature if not self.deferred.called: self.deferred.errback(reason) else: self.deferred.errback(reason) def _showCapabilities(caps): if not caps: print "Server did not return a capability list." else: for key, value in caps.items(): print "%s: %s" % (key, str(value)) def _showError(reason): print reason.getErrorMessage() f = IMAPGetCapabilitiesFactory() f.deferred.addCallback(_showCapabilities) f.deferred.addErrback(_showError) f.deferred.addBoth(lambda _: reactor.stop()) reactor.connectTCP(sys.argv, 143, f) reactor.run()
While I fixed the bugs I happened to see in that example, I also took that opportunity to separate the application logic from the protocol implementation, only stop the reactor in one place (the main program body), and generally clean up the program.
Let me also assure you this is not an isolated incident, here are some more examples:
Page 232, example
If the connection is lost, the program hangs.
loginerrorfunction should start with
Page 247, example
Deferreds in the
DeferredListhave no final errbacks. There should probably be a
consumeErrors=Trueargument given to the
There is no error checking on the result of the
Page 257, example
There is no error checking on the result of the
In case of error, the example dies a misleading death trying to access the
itemsattribute of a
All in all, I am happy to see Twisted get some of the spotlight, but I guess it is too big to be handled as a subtopic of a single chapter.
The Networking in the Book
Overall, I feel the book has succeeded very well in its goal of introducing networking to people. There are some places, though, that state invalid things or that can lead the reader into assuming things that are not valid. I have full confidence the author is familiar with these concepts, and that the only reasons for these points of confusion are hurrying up the result, blindness to ones own text, and trying to keep the book reasonably short. Still, I feel sad to see such things slip by.
Here are some examples:
After showing an example that binds to
127.0.0.1:51423and not just
0.0.0.0:51423, the author states:
If you have a host with an IP address other than 127.0.0.1, you could normally connect to port 51423 on that address; now you cannot.
This result looks suspiciously like it was the result of editing some IP address the author's machine happened to have to
127.0.0.1. The sentence is quite confusing, considering that every host has the IP address
127.0.0.1, in addition to any other addresses it may have. What the author apparently tried to say is, of course, that other hosts cannot connect to the service (page 103).
The book lets novice readers assume TCP would preserve message boundaries. Here's a snippet from a longer example:
# Part of `pollclient.py` from Chapter 5, page 106. data = s.recv(4096) if not len(data): print("\rRemote end closed connection; exiting.") break # Only one item in here -- if there's anything, it's for us. sys.stdout.write("\rReceived: " + data) sys.stdout.flush()
Once again, it is easy to believe more detailed explanations were skipped due to space restrictions, but I personally have seen way too many programs that assume
read(2)returns full lines. The fact that this behaviour is highly likely when testing with manual input, using a line-buffering client application, only makes the situation worse.
# My version of `pollclient.py`. data = s.recv(4096) if not data: print "Remote end closed connection; exiting." break print "Received: %r" % data
Page 87 refers to any protocol with binary content (as opposed to ASCII) as "C-based".
When speaking of CGI-generated HTTP content, the author points out many scripts do not supply a
Content-Lengthheader, and that in this case HTTP signals end of file by connection close. He continues to say that "there's no way to detect a truncated file" (page 125).
I would really love if people would finally embrace HTTP 1.1 and
Transfer-Encoding: chunked, which was created for exactly this purpose. Getting people to realize a solution exists is half the battle, so I would have appreciated a brief mention of chunking here.
I would also have appreciated a brief explanation of how the peer sees
s.shutdown() of a socket, as this is a topic that is often
misunderstood (page 88).
I know this review may seem harsh to americans with overly white smiles. I'm picking individual items and pointing out the problems with them. But do not misunderstand me -- I read through the book quite carefully, and these were the only things I wasn't totally happy with.
This book is 99% good, and the only reason that isn't 100% is due to the wide scope of the book. Which, then again, is also a good thing.
I'm not a big book buyer, I have no shelf full of references, but I am happy to have this book on my shelf. I will happily recommend it to friends looking for a generic Python networking book.
Suggestions for Errata
I kept notes while reading the book, and also wrote down things that seemed like typos or other minor errors to me. Here's a list of those, mostly to help the publisher should they choose to print a second edition of the book:
Page 79, when talking about DNS query type
ANY, states things like:
there's a special case for the query type
ANYin that it sometimes misses
MXrecords (and others) if they aren't requested first
...it only returns information cached by your local name servers, which may be incomplete
unnecessarily clouds the workings of DNS with magic. Merely pointing out that
ANYmeans "whatever information a cache or authoritative server may have" (as opposed to all information) would be sufficient.
Pages 106-108, examples
For some reason, these examples use
print("..."), where everything else is more pythonic and leaves out the parens.
Page 132, example
Could handle hexadecimal character references like
®, and use
unichrto support values >255.
"Different languages use different meanings for the same character"
While this is technically true (German ä and Finnish ä are different things), what was probably meant here is something like:
"Different languages are written with different characters, and different character sets are used to represent hundreds of alternative characters with only 256 integer values."
Page 185, example
Includes some of the ugliest HTML tagsoup I've seen in a while.
There's really no excuse not to use correct XHTML, or at least balanced tags, these days.
Page 211, third line from the bottom:
"read-only" should probably be "read only", the context is about marking messages read, and doing that only when it's downloaded.
"This means that the result of loggedin() -- or the last
Deferredthat it returns -- is passed to stopreactor()."
-- should say
"...or the result of the last..."
Page 259, at the bottom:
"Since these both return callbacks", probably meant "
Page 270, at the bottom:
Missing period from
"build-in" should probably be "built-in".
osslverify.pyrefers to variable
The example is called
cgi.py, but the URL in the text is
ScriptAliasis missing spaces.
There is a race condition in the
SIGCHLDhandling on platforms where the handler is reset on triggering. I did not expect a full treatment of
SIGCHLDproblems, but I did expect a warning to tread lightly.
DeferredListAPI is nonintuitive and has led many people into writing bad code. Unfortunately, no one has been able to formulate a better API for combining multiple