A Python class is created by a class definition, has an associated name space, supports attribute reference, and is callable.
class name[(expr[,expr]*)]: suite
The class definition is an executable statement and as such can be used whereever an executable statement may occur. When executed, each expr is evaluated and must evaluate to a class; then the suite is executed in a new local name space: the assumption is that the statements in the suite will make bindings in this name space, so typically the statements in the suite are assignments and function definitions. After execution of the suite, name is bound to the new class in the outer (calling) name space, and the new name space of the class definition is associated with the class object.
Classes have five predefined attributes:
Attribute | Type | Read/Write | Description |
---|---|---|---|
__dict__ | dictionary | R/W | The class name space. |
__name__ | string | R/O | The name of the class. |
__bases__ | tuple of classes | R/O | The classes from which this class inherits. |
__doc__ | string OR None | R/W | The class documentation string. |
__module__ | string | R/W | The name of the module in which this class was defined. |
The simplest use of classes is as simple Cartesian product types, e.g., the records of Pascal or the structs of C.
class foo: a, b, c = 0, "bar", (1,2)
A class is instantiated by calling the class object:
i = foo() print i.a, i.b, i.c
In the above, i is an instance of the class foo. The slots a, b, and c of the instance can be modified by assignment:
i.a = 12 i.new = "yikes" # dynamic attribute creation!
Note that new slots, which weren't defined when the class was defined, can be created at will simply by assignment. Indeed, when working interactively, an empty class definition is often handy:
class foo: pass foo.a = 1 foo.b = 2
Instances have two predefined attributes:
Attribute | Type | Read/Write | Description |
---|---|---|---|
__dict__ | dictionary | R/W | The instance name space |
__class__ | class | R/W | The class of this instance |
It's important to understand the difference between class and instance attributes, especially since class attributes are accessible via instances.
An attribute defined in the class, either textually in a class definition or later by assignment to an attribute reference of the class, is a class attribute. It is stored in the class's name space (its __dict__).
An attribute defined in the instance, by assignment, is an instance attribute and is stored in the instance's name space -- even if there was a class attribute with the same name! Assignment via the instance results in an instance attribute that shadows the class attribute:
class foo: a = 1 i = foo() foo.a => 1 i.a => 1 i.a = "inst" foo.a => 1 i.a => "inst"
It is possible to modify a class attribute from an instance, but you need to exploit Python's revelation of the respective name spaces:
foo.a => 1 i.__class__.__dict__[a] = "class" foo.a => "class" i.a => "inst"
When an attribute of an instance is referenced via the dot operator, Python first checks the instance name space, and then, if the attribute is not bound, it checks the class's name space. Here is the instance attribute lookup algorithm expressed in Python (N.B.: this is a simplified version of the real algorithm; we'll refine it when we introduce inheritance):
def instlookup(inst, name): # simplified algorithm... if inst.__dict__.has_key(name): return inst.__dict__[name] else: return inst.__class__.__dict__[name]
Note how this function will raise an AttributeError exception if the attribute is defined neither in the instance nor the class: just like Python.
Suppose we have Cartesian points:
cpt = (3,4)
and a function to compute the distance of the point to the origin:
def distanceToOrigin(p): from math import floor, sqrt return floor(sqrt(p[0]**2 + p[1]**2))
Now in our program, when manipulating points, we just call the function:
print distanceToOrigin(cpt)
Now suppose we introduce a new kind of point, a Manhattan point:
mpt = (3,4)
which has a different distance function. We immediately want to rename our first distance function:
CartesianDistanceToOrigin = distanceToOrigin
so that we can define the Manhattan version:
def ManhattanDistanceToOrigin(p): return abs(p[0]) + abs(p[1])
This illustrates a name space problem: we should store our Cartesian and Manhattan functions in different name spaces. We could use Python's modules for this (cartesian.distanceToOrigin, manhattan.distanceToOrigin), but we would still have a problem: how do we know which points are which? We need to add a type tag to each tuple:
CARTESIAN, MANHATTAN = 0, 1 cpt = (CARTESIAN, 3, 4) mpt = (MANHATTAN, 3, 4)
(of course, since our objects' attributes are defined positionally, we now need to recode our distance functions: but that's not the problem we're considering...) and, worse, we need to write type checking code everywhere we use the points:
if pt[0] == CARTESIAN: print cartesian.distanceToOrigin(pt) elif pt[0] == MANHATTAN: print manhattan.distanceToOrigin(pt) else: raise TypeError, pt
To get around this problem we could write a generic distanceToOrigin function so that we could keep the conditional in one place, but we'd still have the problem of having to update that conditional everytime we added a new type of point. And if the author of the new point type isn't the author of the generic function, that can be a problem (the author of the new point type probably doesn't even know of all generic the point-manipulation functions out there, each of which will have a conditional that needs updating). The solution is to associate the functions that manipulate each type of object with the object itself. Such functions are called the methods of the object:
cpt = (3,4, lambda p: floor(sqrt(p[0]**2 + p[1]**2)))
Now to find the distance to the origin for any kind of point pt, we no longer need the conditional: each point knows how to compute its own distance: pt[2](pt).
print cpt[2](cpt)
If the object carries around it's own functions, we don't need a conditional, nor the type information (at least, not for this purpose) and the author of a new type of point doesn't need to change somebody else's generic functions.
mpt = (3,4, lambda p: p[0] + p[1]) print mpt[2](mpt)
This is the fundamental idea of object-oriented programming.
One of the biggest problems with this demonstration is the use of tuples and their positional indexing. Clearly the use of dictionaries would be a big improvement:
cpt = { "x": 3, "y": 4, "distanceToOrigin": lambda p: floor(sqrt(p["x"]**2 + p["y"]**2)) } print cpt["distanceToOrigin"](cpt)
but using dictionaries doesn't give us any templating facility: with dictionaries, for each point we define, we'd need to copy in the definition of distanceToOrigin. What we want are the records of Pascal or the structs of C, and Python has the equivalent of these in its classes:
class cartesian: x, y = 0, 0 def distanceToOrigin(p): return floor(sqrt(p.x**2 + p.y**2)) cpt = cartesian() cpt.x, cpt.y = 3,4 # WARNING: the following is not correct Python code... print cpt.distanceToOrigin(cpt)
This is a lot better, but it's kind of annoying to always have to pass the object itself to its methods, especally since objects are first class and may be the value of complex expressions, e.g.:
x[y].distanceToOrigin(x[y])
This would be so error prone and potentially inefficient (due to reevaluation) that it would require us to always assign complex object expressions to local variables, so Python helps us out with a little bit of syntactic sugar: if you define a function in a class, it is assumed that you intend this function to be a class method, and therefore when you call such a function, Python passes in the instance as the first parameter implicitly: so the correct way to call the distanceToOrigin method is simply:
print cpt.distanceToOrigin()
It's conventional in Python to name the first parameter of a method self, e.g.:
class cartesian: def distanceToOrigin(self): return floor(sqrt(self.x**2 + self.y**2))
This name isn't mandatory, but your code will look very strange to other Python hackers if you use another name.
Python allows you to customize your objects by defining some methods with special names:
def __init__(self, parameters): suite
The parameters are as for ordinary functions, and support all the variants: positional, default, keyword, etc. When a class has an __init__ method, you pass parameters to the class when instantiating it, and the __init__ method will be called with these parameters. Usually the method will set various instance variables via self.
class cartesian: def __init__(self, x=0, y=0): self.x, self.y = x, y
def __del__(self): suite
A __del__ method is called when an object is deleted, which is when the garbage collector decides that their are no more references to an object. Note that this is not necessarily when the object is explicitly deleted with the del statement. The __del__ method takes exactly one parameter, self. Due to a weirdness in the current C implementation of Python, exceptions are ignored in __del__ methods: instead, an error will be printed to standard error.
def __repr__(self): suite
A __repr__ method takes exactly one parameter, self, and must return a string. This string is intended to be a representation of the object, suitable for display to the programmer, for instance when working in the interactive interpreter. __repr__ will be called anytime the builtin repr function is applied to an object; this function is also called when the backquote operator is used.
def __str__(self): suite
The __str__ method is exactly like __repr__ except that it is called when the builtin str function is applied to an object; this function is also called for the %s escape of the % operator. In general, the string returned by __str__ is meant for the user of an application to see, while the string returned by __repr__ is meant for the programmer to see, as in debugging and development: but there are no hard and fast rules about this. You're best off just thinking, __str__ for %s, __repr__ for backquotes.
Using classes to define objects provides a templating facility: class attributes and methods need only be defined once, and you can then instantiate any number of objects, all sharing the same methods.
But we could benefit from more sharing opportunities. Lots of times classes of related objects differ only slightly from one another. Consider the full definitions of our two classes of points:
class cartesian: def __init__(self, x=0, y=0): self.x, self.y = x, y def distanceToOrigin(self): return floor(sqrt(self.x**2 + self.y**2)) class manhattan: def __init__(self, x=0, y=0): self.x, self.y = x, y def distanceToOrigin(self): return self.x + self.y
Both of these classes share the same __init__ method, yet we have to code it twice. We can solve this problem by abstracting the common method into a new, more generic class called point:
class point: def __init__(self, x=0, y=0): self.x, self.y = x, y
Now we can redefine cartesian and manhattan and specify that they inherit from point:
class cartesian(point): def distanceToOrigin(self): return floor(sqrt(self.x**2 + self.y**2)) class manhattan(point): def distanceToOrigin(self): return self.x + self.y
We can define all behavior common to all types of points in the point class, and then define any number of subclasses of point which inherit from it. We could go farther and define subclasses of cartesian or manhattan if that were appropriate.
In some object-oriented languages (e.g., Java), point would be an abstract class: in other words, a class that's used only to inherit from, and not itself directly instantiated. Python doesn't make this distinction: if you want to instantiate point, go right ahead!
Let's look at the class definitition syntax again:
class name[(expr[,expr]*)]: suite
As mentioned earlier, each expr, if given, must evaluate to a class, and now we know why: these are called the base classes, and are the classes that the new class inherits from. If multiple base classes are given, the new class inherits from all of them: this is called multiple inheritance. See the next section for an explanation of how attribute reference works in the presence of multiple inheritance.
Now we can explain class and instance attribute reference in detail.
When looking up an attribute via a class object C, Python first searches the class's name space (C.__dict__); if it doesn't find the attribute, it then recursively searches the class's base classes, left to right and depth first.
When looking up an attribute via an instance object i, Python first searches the instance's name space (i.__dict__); if it doesn't find the attribute, it then searches the instance's class (i.__class__) as described in the previous paragraph.
Here are the complete algorithms for class attribute lookup and instance attribute lookup. These functions each return a 2-tuple whose first element is a truth value indicating the success of the lookup, and whose second element is the value of the attribute, if the lookup was successful, or None if not:
def classlookup(C, name): if C.__dict__.has_key(name): return (1, C.__dict__[name]) else: for b in C.__bases__: success, value = classlookup(b, name) if success: return (1, value) else: pass else: return (0, None) def instlookup(I, name): if I.__dict__.has_key(name): return (1, I.__dict__[name]) else: return classlookup(I.__class__, name)
Some B&D-oriented languages prevent access to the attributes of a class or instance, the idea being that if the author of the class didn't define a method to manipulate an attribute, then the user of the instance has no right to examine or change it. As you might have already guessed, Python doesn't take this approach. Attribute reference syntax can be used to access most instance and class attributes, and __dict__ attributes give the entire show away. The assumption is that you know what you're doing, and if you want to shoot yourself in the foot, that's your affair.
That said, Python does support name mangling: if a method or other attribute name starts with two leading underscores (e.g., __secret), Python magically changes the name so that references to this attribute made in the usual way will fail:
class foo: def __secret(self): pass foo.__secret => AttributeError: __secret
This protection is purely advisory, however: if we examine the class name space we can see what Python is up to:
foo.__dict__ => {'_foo__secret': <function __secret at fc328>, '__module__': '__main__', '__doc__': None}
The method name has been changed, or mangled, into _foo__secret: i.e., prefixed with underscore and the class name. Since this is documented behavior, you can use this name, either going through the __dict__ directly, or just via attribute reference (foo._foo__secret), to access the attribute.
Another important attribute of an object-oriented programming language is polymorphism: the ability to use the same syntax for objects of different types. (Strictly speaking, this is ad-hoc polymorphism.) For example, in Python, the square bracket operator is used to perform indexing of various sequence types (list[3], dict["foo"]); polymorphism allows us to define our own types, as classes, that emulate builtin Python types like sequences and which therefore can use e.g. square brackets for indexing.
We'll start by showing how to override the behavior of the dot operator, which does attribute reference in classes and instances. By customizing attribute reference, an object can perform an arbitrary action whenever one of its attributes is referenced, such as type checking.
def __getattr__(self, name):
This method, if defined, is called when attribute lookup fails. For example, consider the following:
class foo: a = 0 def __getattr__(self, name): return "%s: DEFAULT" % name i = foo() i.b = 1
Since the attribute a is a class attribute of instance i, and the attribute b is an instance attribute of i, the __getattr__ method isn't called when either of these are accessed:
i.a, i.b => 0, 1
But if we try to access an undefined attribute, say c, __getattr__ is called, with the attribute name as a parameter:
i.c => "c: DEFAULT"
Note that __getattr__ won't be called if attribute lookup succeeds via inheritance.
The __getattr__ method should either return a value (of any type) or raise an AttributeError exception.
def __setattr__(self, name, value):
__setattr__ is called whenever an attribute assignment is attempted, regardless of whether or not the attribute is already bound in the instance or class. This happens instead of the normal mechanism of storing the value in the instance dictionary. This method can be used, for example, to perform type checking on a value before assigning it.
The __setattr__ method should not try to assign a value to an attribute in the usual way, i.e., self.name = value, as this will result in an infinite number of recursive calls to __setattr__; instead, the instance dictionary should be used directly:
def __setattr__(self, name, value): self.__dict__[name] = value
def __delattr__(self, name):
This method is called when an attribute is deleted via the del statement.