Introduction to Python: Class 7: Python on the Web

Page Contents

The Standard Python Web Modules
Some Non-Standard Python Web Modules
- The HTMLgen Module
- bobo, the Persistent Object Publisher

The Standard Python Web Modules

These are only some of the standard Python modules that are applicable to the web; see the Library Reference Manual for more, including:

The `cgi` Module

The Python cgi module makes it easy to write CGI scripts; mainly, it provides good abstractions for accessing the environment variables and standard input of the CGI script, making it easy to handle input from forms (but see bobo below for a much more sophisticated way to write web applications). You might also want to remember that the cgi module contains the useful function escape, which quotes characters (like the ampersand and angle brackets) for use in HTML; saves you from writing a regexp.

The `urlparse` Module

This Python urlparse module contains functions for destructuring and restructuring URLs. Here's a function that normalizes a possibly incomplete URL to support the kind of non-standard shorthands that people use all the time; in addition, it strips query and fragment parts because I needed to do that in the application for which I wrote this:

      def normalize(url):
	  """Normalize url by stripping any query and fragment parts; also,
	  if original url was of the form `www.foo.com', convert this to
	  `http://www.foo.com'.
	  """
	  (scheme, netloc, path, _, _, frag) = urlparse(url, "http")
	  if not netloc and path:
	      return urlunparse((scheme, path, "", "", "", ""))
	  else:
	      return urlunparse((scheme, netloc, path, "", "", ""))

The `urllib` Module

The Python urllib module implements a fairly high-level abstraction for making any web object with a URL act like a Python file: i.e., you open it, and get back an object with readline and read methods (etc).

Here's a program I wrote that uses urllib store (cryptographic) checksums of web pages in a database; you can then run the program periodically to get a report of which pages have changed.

notify source code

The `httplib` Module

The httplib module implements the client side of the HTTP protocol. It can be used with similar effect to urllib, but only for http: URLs and it takes more coding. However, it's a lower-level abstraction that gives you much more control over details of the HTTP protocol, like the MIME headers.

Here's a short program I wrote that uses httplib to do a stress test of an HTTP server:

stressout.py source code

The `ftplib` Module

The ftplib module implements the client side of the FTP protocol.

Here's a short program I wrote that Binding uses to do weekly FTP uploads and downloads of data to and from a vender:

bindingftp.py source code

The `BaseHTTPServer` Module

The BaseHTTPServer module implements the server side of the HTTP protocol. This module actually implements a true HTTP server; due to its object-oriented implementation, you can simply inherit from the appropriate class to make a custom web server.

Here's a short program I wrote called plain that implements persistent URLs: it's just a simple HTTP server that does nothing but HTTP redirects, using a mapping in a database. These 97 lines of Python (about 2K of code) do pretty much the same thing as OCLC's PURL system, which ships as 7M (that's megabytes...) of C and Perl.

A plain server is running on www.lib.uchicago.edu if you want to try it out.

plain.py source code

Some Non-Standard Python Web Modules

The `HTMLgen` Module

`bobo`, the Persistent Object Publisher

The Python Persistent Object Publisher, affectionately known as bobo implements an extremely sophisticated alternative to CGI. It's an ORB, or Object Request Broker, that translates URLs into direct calls of Python objects, allowing you to write code that's actually independant of the CGI protocol (this makes it trivial to run your code natively inside the Python-based Medusa web server, for example).

Here's an aborted (for political reasons) project that demos a web-based interface to electronic reserves in the UofC Library. While this is just a prototype, the data is being retrieved live and in real time from Horizon, using bobo and the Python Sybase module.

A much more ambitious (if also sadly not quite finished...) bobo application is my own personal web-based guide to the beers of Austria. This is a bobo application where each page is formatted live from data in a mySQL relational database; it also uses bobo's DocumentTemplate facility to generate the HTML.

Introduction to Python: Class 7: Python on the Web

Page Contents

The Standard Python Web Modules

The cgi Module

The urlparse Module

The urllib Module

The httplib Module

The ftplib Module

The BaseHTTPServer Module