Some of us are born to python, some rise to python, and others have python thrust upon ‘em. Let’s learn you a python.
This document assumes you’ve seen computer programming before, but tries to be kind in how it is paced. Everything here is for Python 3.
In some sense, the two core actions of computer programming are abstraction and naming. That is: we’re going to try and make code that expresses an idea; we have to give that code and its constituents clear and meaningful names. This is, I think, a tricky idea to get your head all the way around without thorough exposure. My intention is that this document will point out some places where we abstract ourselves from something so the ideas sink in well.
We’ve briefly covered the notion of a type. Python offers us some foundational types to work with:
Type | Specification | Example |
---|---|---|
int | Integer; effectively unlimited size | 1, 5, 12,487,129,420 |
float | Double-precision floating point number | 0.219, 50.6 |
complex | Complex numbers | 2i |
bool | Boolean | True/False |
str | String | ‘cat’, ‘house boat’ |
Python is a strongly, dynamically typed language. This means we almost never have to care about what the type of a thing is when we declare or receive it, but we cannot use types interchangeably in some contexts. For instance:
>>> 2 + 2 4 >>> 'cat' + 'dog' 'catdog' >>> 2 + 'dog' Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported operand type(s) for +: 'int' and 'str'
Between ints, +
means “addition”; between strings, it means “concatenation”. But
between an int and a string, python cannot and will not guess what +
means, and
throws a type error. We must “cast”, changing the type of one operand to match
the other, so that python knows how to +
everything together correctly:
>>> str(2) + 'dog' '2dog'
Ok, so we’re going to name things. In python, we name things by giving them an identifier. A valid identifier in python follows these rules:
- It can be any combination of upper and lowercase letters, numbers, and the
_
character. - It must start with a letter.
So, my_swe3t_l33t_IdEnTiFi3r
is valid (but don’t ever do that); 3rd_item
is not.
Python usually follows these conventions:
- Identifiers used for variables are in snake case: all lower-case letters with
words separated by the underscore character.
E.G.
a_variable_named_foo
- Identifiers for classes are in title case: each word with its first letter
capitalized, no spaces or underscores.
E.G.
MyFooClass
Python code is made up of statements. A statement can be a whoooole lot of things. A statement might be variable assignment, or creating a function. Simply listing a type or an object isn’t a statement.
So, in an interpreter:
>>> 5 # not a statement 5 >>> x = 5 # statement
Variable assignment is one of the most standard features a programming language can have. In python, variable assignment is as simple as can be:
x = 5
print(x)
Put another way: we’re binding the value 5 to the identifier x. We can bind any value we want to any valid identifier this way.
Now: it’s important that you understand that there is a thing called scope, which affects when and how variables can be accessed. We’re going to get to scope soon, but we need a few more ideas before we can fully explain it.
Before we get too far, there’s a thing about Python you should know – which is a thing that’s true of many programming languages, so it’s useful to be clear on. This is the notion of reserved words. It goes like this:
When we write code, we express to a computer what we want it to do. The language we use to express ourselves is our programming language. That language has some syntax, made of words and symbols, that allows us to get our ideas and intentions written down. Certain words and symbols are baked in to the language, very deeply – their meaning cannot be changed by us, and we have to respect and use these words only in very specific ways.
(Nota bene: in python, “reserved words” are typically referred to as “keywords.” Same idea, slightly different name.)
What this means in practice is that we cannot use a reserved word as an identifier. For instance:
False = 5 # NOPE
import = 7 # SUPER NOPE
The python keywords are:
False
, class
, finally
, is
, return
, None
, continue
, for
, lambda
, try
, True
, def
,
from
, nonlocal
, while
, and
, del
, global
, not
, with
, as
, elif
, if
, or
, yield
,
assert
, else
, import
, pass
, break
, except
, in
, raise
We will get in to what most of these do as we work through this document! Hang in there.
Let’s say we want to make a logical statement about the comparison of two values. If we’re dealing with numbers, python provides a set of built-in operators to help us do precisely this. We can explore this in the python interpreter:
>>> 5 < 6 True >>> 1 > 100 False
Note our first two keywords: True
and False
.
Python also supports greater-than-or-equal to, so:
>>> 5 >= 9 False >>> 9 >= 9 True
Or we can test equality:
>>> 10 == 10 True
Common in many languages, exclamation point captures the idea of negation in a symbol. So, “not equal” is written:
>>> 4 != 5 True >>> 4 != 4 False
Python also provides the keyword not
, which, as with !
, negates any Boolean
expression following it:
>>> not True False >>> not 4 == 5 True
Note that python also has nice English keywords for Boolean operators: and
and
or
:
>>> False or True True >>> False and False False >>> False and True False >>> True and True True
Along with equality operators (e.g. ==
), python provides an identity operator.
While extremely useful, the identity operator can also lead to some very subtle
bugs. This is in part because the identity operator is is
, and thus has a much
more natural language syntax than ==
. However, observe:
>>> a = 19998989890 >>> b = 19998989889 + 1 >>> a == b True >>> a is b False
Equality compares the value of two things; identity checks to see if two things are literally the same object in memory.
As a general rule, is
can always be used to compare with True
, False
, and None
.
This is because these three values (all keywords, notice) are singleton objects
– there is only one True
object, ever, period, so equality and identity are
effectively interchangeable. For more complex kinds of values, it’s often better
to stick to ==
. Thus:
>>> x = True >>> x is True True >>> x is not False True >>> y = 10 >>> y == 10 True
If we have a notion of Boolean values and truthiness, we can now decide to
change the way our program works based on some Boolean condition. This is called
control flow
, and it is very nice.
The single most common control flow structure is the if / else
block. Python
elides the common else if
phrase in to elif
, for no reason in particular.
x = 5
if x > 10:
print('X is greater than 10!')
elif x == 10:
print('X is exactly 10')
else:
print('X must be less than ten')
These checks can get quite complex:
if x < 5 or y is 'cow':
print('woah')
elif (x is 5 and y is 5 and z is 5) or skip_the_fives:
print('okay double woah')
else:
print('whew')
A thing to notice: instead of doing an explicit comparison, we can use the Truthiness of a term directly:
if 5:
print('it must be 5')
Seen slightly less frequently, but still fairly common, is the while
construct,
which loops “while” some term is truthy:
x = 0
while x < 10:
print(x)
x = x + 1
Note two things:
- If
x
weren’t mutated, the loop would loop forever. - You can use a
while
loop to loop forever, on purpose.
#2 is not uncommonly seen for the “main loop” of a program. That is: if we #consider a computer “program” to be a thing that sits idle until some action #occurs, then goes back to being idle, we could express that idea like so:
while True: if check_for_user_input(): respond_appropriately()
Python has a broad notion of what we often call “truthiness”. That is: certain
values are implicitly considered to be roughly equivalent to True
or False
when
used in control flow expressions.
So:
- Truthy Values are
-
True
- Any string with length greater than 0
- All numbers
- All non-empty collections
- Most object instances (we’ll get in to what this is in a little bit)
- Falsy Values are
-
False
- Empty string
- Empty collections
None
We use them like:
a_list = []
if not a_list:
print('it is empty!')
else:
print('it is full')
Or:
full_string = 'this is a string'
empty_string = ''
if full_string:
print('there was some string!')
if empty_string:
print('you should be surprised if this prints')
A “collection” is, as the name implies, a kind of container or group of Things. Python comes with four main collection types built-in; in practice, we use two of them vastly more than the others. For every collection, python provides a literal syntax, which is a shorthand way of creating a new collection.
Note: all collections in python are zero indexed. This means that the very first element in a collection is the 0 element, the second is the 1 element, etc. This takes a little getting used to, but is also very common.
Also note: all python collections are heterogeneous – they can contain Things of any combination of types, including other collections.
A tuple is an immutable, and usually small, collection. It is used to group
together a small number of things we implicitly assert are related to one
another. The tuple literal is a set of parens ()
. We access the elements of a
tuple by their index.
x = ('cat', 'dog', 'phone')
print(x[0])
print(x[1])
print(x[2])
Note a python oddity: to make a single-element tuple, a comma is needed after
the first element – e.g. ('cat',)
.
A list
is one of the data structures we interact with alllllll the time in
python. We can make a list with the list
function, but it’s more common to do it
with the list literal, which is a set of square braces []
.
Lists are ordered and mutable. We access the elements of a list by their index.
a_list = [5, False, 'gazpacho']
print(a_list[2])
A dict
captures the notion of key-value pairs in python; the name is short for
dictionary, which gives us a very good hit about its use. Dicts
offer us very
fast lookup of elements. There is a dict
function, but we more commonly use the
curly-brace literal, {}
, with the internal format keyname, colon, space, value
of key (E.G. {name_of_key: value}
.)
The key of a dict
is typically a string, but sometimes, tuples or integers are
used.[fn:6]
We access a list of the keys in a dict
using an instance[fn:2] method called
keys()
. We access values by the name of their key. Like so:
the_dict = {'googoo': 'cachoo',
'hocus': 'pocus',
'Marlon': 'Brando'}
print(the_dict.keys())
print(the_dict['hocus'])
A set is a very handy data type with a special property: every element of the
set is guaranteed unique. Sets are, thus, used for uniquing, and for
maintaining collections of unique elements. You can use the set
function, or
you can use the set literal, which is, slightly confusingly, also curly braces
{}
. (If there are no colons inside the braces, python knows it’s a set
, not a
dict.
)
When you create a set, all of the elements will be uniqued correctly. This is done by… wait for it… hashing each element, which means each element in a set must be hashable.
list_with_duplicates = [1, 1, 1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 5, 5, 5, 5, 5]
the_set = set(list_with_duplicates)
print(the_set)
For those of you with a math bent, you might be thinking, “I wonder if we can take the union, difference, and intersection of Python’s sets?” Good news! You absolutely can. The interface is exposed as instance methods on a given set.
first_set = {1, 2, 3}
second_set = {3, 4, 5}
# The union of two sets is all the unique elements of both sets together in one
print(first_set.union(second_set))
# The intersection is only those elements found in both sets
print(first_set.intersection(second_set))
# The difference is all the elements from the calling set not found in the
# argument set -- in this case, all the elements in first_set not found in
# second_set
print(first_set.difference(second_set))
Collections can do a lot of handy things for us. It is, for instance, awfully
useful to be able to group like units of stuff together. A common example of
this is a settings file, which can be loaded in to your application as a dict
.
Wanna know the value of a setting? If all your settings are in a dict
, you can
access them by key. Easy peasy.
Another very common use case is the need to take some action of every Thing
inside a collection. Python supports this through the for
construct, like this:
a_list = [1, 2, 3, 4, 5]
for number in a_list:
print(number * number)
number
is an arbitrary name I chose; you can pick any valid python identifier
here, so pick something descriptive for what’s in your list.
So, how does python know what kinds of things can be used in a for
loop? The
answer is: much as anything with a __hash__
method is hashable, anything with an
__iter__
method is iterable. (We’ll cover this more when we go over magic
methods.) In practice: all of the core python collection types – tuples, lists,
dicts, and sets – are iterable.
The cagey observer might wonder: what does it mean to iterate over a dict? Great question. To control what we get when we iterate over a dict, we have several approaches:
demo_dict = {'first_key': 'first_value',
'second_key': 'second_value',
'third_key': 'third_value'}
# Iterating only the keys can be done two ways:
for key in demo_dict.keys():
print(key)
# Iterating over the keys is also the "default" behavior if no method is
# called:
for key in demo_dict:
print(key)
# But maybe you'd rather iterate over the values!
for value in demo_dict.values():
print(value)
# Or maybe you want, wait for it, BOTH AT ONCE:
for key, value in demo_dict.items():
print('The key: ' + str(key) + ' maps to value: ' + str(value))
This last example uses a technique we haven’t talked about called <a href=”Tuple Destructuring”>Tuple Destructuring, which we will get to Soon™.
One last handy trick: sometimes you want to know the index of each value as you iterate. Observe!
a_list = ['cat', 'dog', 'butter']
tpl = '{} has index {}'
for idx, item in enumerate(a_list):
strang = tpl.format(item, idx)
print(strang)
(I’ve slipped in an early first example of python’s String Formatting system. We’ll get in to it more later!)
Python has a rich and very powerful faculty called comprehensions, which combine the notion of iteration and collection creation in to a single tidy syntax.
Consider a contrived example: let’s take all the numbers between 0 and 50,
square them, and return only those numbers divisible by 2. We’ll do this first
with a for
loop:
res = []
for i in range(0, 50):
squared = i * i
if squared % 2 == 0:
res.append(squared)
print(res)
We’re using a technique here called an accumulator – as we go, when we find a
number we want to keep, we keep it by appending it on to res
, which we then
return.
Or, we could write it like this:
print([i * i for i in range(0, 50) if (i * i) % 2 == 0])
Blam. Same result, but much shorter. Comprehensions allow us to create a new collection by iterating over any iterable; we can optionally filter as we go.
We can iterate two things at once:
print([(x, y) for x in ['a', 'b', 'c'] for y in [1, 2, 3]])
(Note that we generate all combinations, not just [('a', 1), ('b', 2), ('c',
3)]
)
There are also comprehensions for other collection types. We can create a dict, from our earlier example, in which the key is the original number and the value is the square:
print({i : i * i for i in range(0, 50) if i * i % 2 == 0})
<3 comprehensions. So good! Do note, however, that as a comprehension grows longer and more complex, it becomes less and less of a good idea. If you find you’re packing a lot of logic in to a comprehension, consider switching back to a plain, easy to read for-loop.
We’ve got a ton to work with so far. Heck – we could write some pretty complex python scripts with just what we’ve done so far. We’ve got the notion of storing a thing to a variable; we’ve got the notion of a collection, a group of Things. The next item on our agenda is my personal favorite: the function.
Functions are created using the keyword def
, like this:
def do_nothing():
"""
An optional docstring
"""
pass
So here’s a function that… does nothing. (Our next keyword, pass
, is the noop
keyword – pass means, “just keep on steppin’”.) Sure? Check it out: it’s time for our
first real taste of abstraction. Say we want to multiply numbers by two, and we
want to use functions. We could do it like this:
def one_times_two():
return 1 * 2
def two_times_two():
return 2 * 2
def three_times_two():
return 3 * 2
def four_times_two():
return 4 * 2
Perhaps you can see how quickly this will fall apart. It’s functional, but not practical. We can do better. Let’s make our function take an argument:
def times_two(integer):
return integer * 2
We now have a function that takes some argument and returns that argument multiplied by two. Is this a super trivial example? Well, yes. And: it’s also an easy demonstration. We are abstracting the notion of multiplying by two. By using a function argument, we can now multiply really anything by two! It’s a small abstraction, but the idea is important – the function is both a little more generic and a little more specialized.
Most of the time, a function should be called and the give back some value. We
do this, in most cases, with the return
keyword.[fn:3] We can return
multiple
times, or not at all. Like so:
def check_out_this_x(x):
if x > 500:
return 'It is a biggish X'
elif x < 250:
return 'I guess it could be a kinda big X but probably it is not'
Let’s think this through. If X is 600, we’ll get back the string “It is a biggish X” – all well and good. If X is, say, 5, we’ll get back the second, much longer string. And if X is 300? What then?
Answer: we’ll get back None
. Any function which doesn’t specify an explicit
return
returns None
.
(Also notice: we didn’t specify an else
for our if
block. This is poor form ;-P
The correct way to write this function would be to explicitly return None
from
and else
).
Docstrings are optional, but great. Why are they great? One, using Sphinx, you can generate very nice online documentation that includes your docstrings. For a great example of this, have a look at the documentation for an operations tool called Fabric. Here’s a page of clean, compiled documentation; here is the source code that generated the docs. Pretty cool, eh?
The other thing we can do is learn about functions and classes from inside the
python interpreter. For instance, say you wanna know about the len
function:
>>> help(len) Help on built-in function len in module __builtin__: len(...) len(object) -> integer Return the number of items of a sequence or collection.
Good stuff, eh?
Here’s a trick I love: what if you usually want an argument to always have the same value, but sometimes you wanna change it?
def usually_multiply_by_two(integer, mult_by=2):
return integer * mult_by
This function can be called as usually_multiply_by_two(5)
, or it can be called
with a second argument, which will then be used – usually_multiply_by_two(5, 5)
will return 25, not 10.
Now, a thing to pay attention to: if a function has multiple optional arguments, you can either specify them positionally, or using the name, but don’t do both.
That is:
def multiple_optionals(foo=5, bar=6, baz=10, blep=123):
tpl = """
I was called with:
- foo = {foo}
- bar = {bar}
- baz = {baz}
- blep = {blep}
"""
return tpl.format(foo=foo, bar=bar, baz=baz, blep=blep)
print(multiple_optionals('hi', 'cow'))
# But, if I only want to change the value of baz:
print(multiple_optionals(baz='Cowabunga'))
Also note: it is a syntax error to list optional arguments before required arguments in a function:
# Do this:
def foo(bar, baz=None):
pass
# Not this! No no no!
def foo(baz=None, bar):
pass
Especially if you look at really any python documentation, you’re gonna see a pattern over and over that will throw you off the first few times, like this:
def foo(bar, *args, **kwargs):
pass
args
and kwargs
are a little weird at first, but they do cool things, and unlock
cool powers. Let’s dig in.
Both args
and kwargs
are for times when you aren’t sure in advance what aruments
your function will need to take. args
is used when you aren’t sure how many
arguments there will be; kwargs
is a dict containing any unspecified keyword
arguments to your function. Let’s see this in action:
def so_many_args(foo, bar, baz, *args, **kwargs):
tpl = "The {}, the {}, and the {}".format(foo, bar, baz)
print(tpl)
print(args)
print(kwargs)
so_many_args('this', 'that', 'the other')
so_many_args('hi', 'hi', 'hi', 'hi', 'hi', 'hi', 'hi!') # so man 'hi's!
so_many_args('hi', 'hi', 'hi', TheFroz='kazoo', Spork='nugget')
So our function arguments foo, bar, and baz are assigned the first three values;
*args
winds up with the rest – thus we see it empty in the first invocation,
but with four “hi”s in the second. **kwargs
is empty in invocation one and two
because we have no unexpected named arguments. In invocation three, we have no
extra positional args, but we do have two spare keyword args.
If we truly don’t care how many Things are handed to a function, we could use
*args
on its own and be done with is:
def add_em_up(*nums):
res = 0
for num in nums:
res = res + num
return res
print(add_em_up(1, 2, 3, 4, 5, 6, 7, 123))
Plot twist: I changed the name of *args
to *nums
! “args” and “kwargs” are names
based purely on convention. Like any convention, you should both use it most of
the time and feel free to bend it when it stops making sense.
Back to **kwargs
, what about this:
def foo(**kwargs):
tpl = '\t-{} with val {}'
print('Hello! I was called with:')
for key, val in kwargs.items():
print(tpl.format(key, val))
foo(panda='panda', another_panda='yep it is another panda')
So this is nice and also completely terrible. On the one hand, this is very powerful – we can write functions the effects of which we cannot even predict! On the other hand: we can write functions the effects of which we cannot even predict :/
Think of it another way: argument names to functions are themselves
documentation. If you encounter a function called
save_an_item_to_a_database(item, database)
, you can form a pretty clear
intuition about what that function does. On the other hand, a function called
save_an_item_to_a_database(**kwargs)
is… uh. What… do you give it? Now
imagine that function has no docstring. Now imagine yourself with a migraine.
Yeaaaaaaah.
These are good powers, but don’t abuse them, yeah?
*
and **
have a last cool use that kicks in when we use them to call functions.
*
can “explode” a list, turning it in to positional arguments in a function
call; **
can break apart a dict, matching the keywords inside it to named
arguments of the function.
Whew, okay, that sounds weird. Let’s see it in practice.
First *
:
three_things = ['foo', 'bar', 'baz']
def print_three_things(first, second, third):
print(first)
print(second)
print(third)
print_three_things(*three_things)
Each item has been “slotted in” to the function. Oooh!
Now **
:
a_dict = {'foo': 'Hello from the foo!',
'bar': 'The bar also says hello!'}
def print_a_dict(foo='Nope', bar='Also nope'):
print(foo)
print(bar)
print_a_dict(**a_dict)
Say it with me: ooooh! aaaaah!
lambda
is the python keyword for an anonymous function. Effectively, a lambda is
kind of a magic instant throw-away function. To be honest, this technique isn’t
used super frequently in python outside of python’s (somewhat limited)
functional programming interface, which looks like this:
Say I want to multiply every number in a list by 7. Voila:
the_list = [1, 2, 3, 4, 5]
res = map(lambda x: x * 7, the_list)
print(res)
map
takes a function and a list, and returns a new list that is the result of
calling the function on every element of the input list. It is exactly
equivalent to:
def times_seven(x):
return x * 7
the_list = [1, 2, 3, 4, 5]
res = [times_seven(i) for i in the_list]
print(res)
Note that our lambda
implicitly returns – we don’t use the return
keyword.
What else are lambdas good for? Well, think a little more about what we just
saw. We passed a lambda as the first argument to the map
function! Neat! In
python, functions are “first class” values, meaning they can be used anywhere,
say, 5 can be used – we can store a function to a variable, we can pass a
function to another function as an argument, and we can return a function from a
function. Here’s a slightly less contrived use for a lambda
using python’s
String Formatting system. We’ll talk about it more in depth in a bit, but here’s
the salient points:
- Curly braces in a string get replaced by arguments to
String.format
- If there’s a name inside the curly brace, it becomes a keyword arg – e.g.,
Hi there, {name}
should be called withformat(name='Bartholomew')
.
def make_dict_formatter(template):
return lambda the_dict: template.format(**the_dict)
one_template = 'The baz: {baz} The blep: {blep}'
a_dict = {'baz': 'I am the baz!', 'blep': 'I am the blep!'}
the_formatter = make_dict_formatter(one_template)
formatted_string = the_formatter(a_dict)
print(formatted_string)
There’s a little bit of a subtle shenanigan going on in our make_dict_formatter
example; let’s dig in to that. To get our heads around it, though, we need to
understand the idea of scope. Let’s consider:
assertion = 'Cats are mortal, Aristotle was mortal, therefore Aristotle was a cat.'
def how_about():
print(assertion)
how_about()
def but_then():
assertion = 'That whole Aristotle-cat thing is a syllogism.'
correctly = "Cats are mortal, Aristotle was mortal, go home syllogisms, you're drunk."
print(assertion)
but_then()
print(assertion)
print(correctly)
So, we start with an assertion. We call how_about
. What happens?
Next, we define a function but_then
that also defines an assertion
. What value
does it print?
Finally, we attempt to print the value of correctly
. What happens?
What we’re dealing with here is the question of scope, which is to say, “when does One Thing in a programming language have access to a particular set of variables and when doesn’t it?” There is a lot more to say on this topic than we have time for. We’re going to spend like four sentences on the theory behind what’s going on, and then we’re going straight to the pragmatics.
What’s happening here on a theoretical level goes like this: python is statically scoped (this is the most “normal” kind of scoping you can have if you are a modern programming language). Further, it has lexical scope.
- Static scope
- as opposed to dynamic scope. In a statically scoped program, we know the values of our symbols at compile/interpretation time. In a dynamically scoped language, we don’t know until runtime. (Note that this is not the same thing as, though it is analogous to, python being dynamically typed.)
- Lexical scope
- a subset of static scope, lexical scoping means that we have certain kinds of semantic blocks of code which create their own scope. The most important, and most common, example of this is functions, which always create their own scope, but which also always inherit from the parent scope.
Whew. Okay. Let’s do that again, but in a much more pragmatic way:
First, we define assertion
. assertion
is in our “global” scope – it is at the
“top level” of the code snippet. It isn’t inside a function or any other kind of
lexical block – it’s just there.
Next, we define how_about
. how_about
creates a new scope, but it inherits from
the parent scope – so it has access to our “global” assertion
. Great.
Now we define but_then
. but_then
also defines an assertion
, and its assertion
“wins”, seamlessly overwriting the “global” value, but only inside the function
block. We confirm this by calling but_then
, and then immediately checking the
value of assertion
.
Finally, we attempt to access the value of correctly
from inside the but_then
function. We get an error, because the inheritance of scope goes one-way –
but_then
inherits the parent scope, but the parent scope is unaltered.
Scope is a subtle, but important point – it allows us to do things like safely re-use common variable names inside functions, and to not have our functions “leak”, mutating the world outside of their intended purview.
So, what’s going on with our make_dict_formatter
function? We’re using scope to
our advantage with a technique called a closure. template
is an argument to the
parent make_dict_formatter
function; it is then available inside the body of a
new function. Here – it might be easier to see like this:
def make_dict_formatter(template):
def formatter(the_dict):
return template.format(**the_dict)
return formatter
We open a new scope with make_dict_formatter
, then we open another new scope
with our inner function formatter
(a lambda behaves identically, but never
receives a name). The formatter
function has access to template
from its parent
scope, but the template
variable never leaks – we have provided a private
configuration to a function.
Now, back to our assertion
example. Sometimes, it can be handy to modify global
state from inside a function. To this end, python provides the global
keyword.
We use it like this:
a_global = 'shazango'
def change_global(new_val):
global a_global
a_global = new_val
print(a_global)
change_global('woopwoop')
print(a_global)
Inside our function, we tell python, “we don’t want to create a new local variable, we want the same variable we inherited from the main scope.” Pow.
Functions are how me model actions – verbs, if you will – in programming. Classes, then, are how we model nouns. Yes, there are gray areas – nouns can sometimes take actions – but as we’ll see, they do that by having access to their own functions (verbs).
To really grok classes, we need to take a moment to understand instances. If a
class models a noun, an instance represents an actual one of that noun. So for
example: there is a class called Dict
. When we make a dict using {}
syntax, we
are instantiating a new instance of the Dict
class. The Dict
class is general,
the pattern on which all dicts are based; our instance is specific. We create
instances either using normal-looking functions (as with the dict()
method), or
using a specialized kind of function called a constructor. Using a constructor
looks like this:
foo = Foo()
To define a new class in python we use – wait for it – the class
keyword:
class Fruit():
"""
I am a model of a fruit!
"""
carbon_based = True
def __init__(self, name, taste, color, climate):
"""
The constructor of new fruit!
"""
self.name = name
self.taste = taste
self.color = color
self.climate = climate
def which(self):
"""
I will print the name of this fruit!
"""
print('I am a {}!'.format(self.name))
Let’s take this a piece at a time. First, we declare our class and give it a
name. By python convention, our class name will be in TitleCase – in this
instance, Fruit
. The open-and-close parens following the name deal with
Inheritance, which we’ll get to next – for now, just note we aren’t inheriting
anything here.
Next, we can, optionally, provide a docstring (always a good idea). And now: as
many statements as we feel like making. We’ll make three – our assignment of
carbon_based
and two functions. Terminology alert: when a function belongs to a
class, we call it a method.
Before we go much further, it’ll help to see this in action:
>>> banana = Fruit('banana', 'awful', 'yellow', 'somewhere too hot') >>> banana.carbon_based True >>> banana.taste 'awful' >>> banana.which() I am a banana!
So: we instantiate a new Fruit
by calling its constructor, which is called…
Fruit()
. We give it arguments, which become part of our class instance (we’ll
explore the mechanism for this in just a moment, hang in there.)
From here, we can see that our statements have become part of our class
instance. carbon_based
is, as we’d expect, set to True
. We set properties like
self.taste
, and now we can access them. We also have access to the which
method,
which tells us our instance is a banana. Great.
Now lets look at something:
>>> Fruit.carbon_based True >>> Fruit.name Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: class Fruit has no attribute 'name'
When we use the Fruit
class directly, we can access the carbon_based
property,
but not the name
property. What do?
The answer is in the difference between static and instance properties.
carbon_based = True
is a statement we make at the class level, and it becomes a
static property of the class – which means we can access it directly on the
class definition. On the other hand, name
is only assigned when we create an
instance, and is thus not available on the class. We’ll see a similar, but
slightly more confusing, error if we try to call the which
method on the class:
>>> Fruit.which() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unbound method which() must be called with Fruit instance as first argument (got nothing instead)
Note that the function signature of both __init__
and which
begin with the
keyword self
. self
is a reference to the current instance, and in python, an
instance method is defined by taking a self
reference as its first argument.
Which brings us to: our constructor, __init__
!
__init__
is a python “magic method”; it identifies a special kind of function
called a constructor. Constructors are used to create class instances. So, when
we define an __init__
method on a class, we have the power to specify exactly
how that class gets created. Are properties set? Methods called? Songs sung?
Only we get to say.
An __init__
method can do anything to the self
reference it wants to, but do be
wary that you are still creating the object. For instance, this will asplode:
class OhNo():
def __init__(self):
self.beep = self.boop()
def boop(self):
return self.beep
>>> uh_oh = OhNo() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/gastove/Code/pythonathon/pythonathon.org[*Org Src pythonathon.org[ python ]*]", line 3, in __init__ File "/Users/gastove/Code/pythonathon/pythonathon.org[*Org Src pythonathon.org[ python ]*]", line 6, in boop AttributeError: OhNo instance has no attribute 'beep'
We reference self.beep
before it is given a value! Sad day.
“Inheritance” is a common design pattern in modern object oriented languages. It can be single or multiple; python is the latter, and we’ll explore the ramifications of that next.
Inheritance works like this:
Imagine we’re trying to create classes to model different kinds of vehicles. We could do it a buuuunch of different ways. Here’s one:
class Car():
wheels = 4
has_engine = True
def __init__(self, top_speed):
self.top_speed = top_speed
class Motorcycle():
wheels = 2
has_engine = True
def __init__(self, top_speed):
self.top_speed = top_speed
class Bicycle():
wheels = 2
has_engine = False
def __init__(self, top_speed):
self.top_speed = top_speed
Hopefully, this smells a little funny to you. We’re repeating ourselves a looooooot. Everything has the same init method! Properties are repeated! Erg. You know what we need? A way to abstract over the idea of a set of nouns in a hierarchy with shared properties.
Behold, inheritance:
class Vehicle():
wheels = 0
has_engine = True
def __init__(self, top_speed):
self.top_speed = top_speed
class Car(Vehicle):
wheels = 4
class TwoWheeledVehicle(Vehicle):
wheels = 2
class Motorcycle(TwoWheeledVehicle):
pass
class Bicycle(TwoWheeledVehicle):
has_engine = False
Woooooooah. What even is this. Let’s investigate:
First we define a base Vehicle
, which captures all the ideas we need to describe
A Vehicle. Next, we define a Car
– the syntax Car(Vehicle)
means that Car
is
inheriting from Vehicle
. (This is often called an “is-a” relationship – Car
is-a Vehicle
.[fn:4])
In our Car
class, all we do is specify the number of wheels. Everything else is
inherited from the parent, or base, class, including all methods. When we go to
create a car, the __init__
method from Vehicle
will be called. Neat, eh?
Now we derive a class for TwoWheeledVehicle
, and we derive two variants of it. A
Motorcycle
doesn’t need to change anything at all – two wheels, has engine, an
init from the base class – Motorcycle
is all set. Bicycle
just needs to set
has_engine
to False
.
Boom.
Python technically supports a property called “multiple inheritance.” Mostly,
this is very bad news, because it can be very confusing. You’ve already seen
this in action, in our http-demo
:
Base = declarative_base()
class IdPrimaryKeyMixin(object):
id = Column(Integer, primary_key=True)
class DateTimeMixin(object):
created_on = Column(DateTime, default=datetime.now)
updated_on = Column(DateTime, default=datetime.now, onupdate=datetime.now)
class Person(Base, IdPrimaryKeyMixin, DateTimeMixin):
__tablename__ = 'people'
first_name = Column(String(20), nullable=False)
last_name = Column(String(30), nullable=False)
def __repr__(self):
tpl = 'Person<id: {id}, {first_name} {last_name}>'
formatted = tpl.format(id=self.id, first_name=self.first_name,
last_name=self.last_name)
return formatted
Note: in Python 2, we had to explicitly inherit from object
in order to make a
correct, new object – in Python 3, we don’t have to do this.
So – we make a set of classes labeled as Mixins
, because you’d never
instantiate them directly – they’re only useful to add Extra Properties to
another class.[fn:5] Now, the Person
class has an id
property and both
created_on
and updated_on
properties – clean and tidy.
This can get really weird:
class Beep():
def sound(self):
return self.beep
class BeepPrinter():
def print_beep(self):
return 'I go: ' + self.sound()
class BeepBooper(Beep, BeepPrinter):
def oh_no(self):
print(self.print_beep())
>>> b = BeepBooper() >>> b.oh_no() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/gastove/Code/pythonathon/pythonathon.org[*Org Src pythonathon.org[ python ]*]", line 13, in oh_no File "/Users/gastove/Code/pythonathon/pythonathon.org[*Org Src pythonathon.org[ python ]*]", line 8, in print_beep File "/Users/gastove/Code/pythonathon/pythonathon.org[*Org Src pythonathon.org[ python ]*]", line 3, in sound AttributeError: BeepBooper instance has no attribute 'beep'
In this example, the bug is that none of the three classes define a beep
property. But which one should? Where is the bug? As the class hierarchy grows
larger, this problem gets worse and worse and worse. Be careful of it!
You’ve almost certainly hit exceptions before. Exceptions are how python – and
many, many other languages – think about errors and error handling. They very
often have the word “error” or “exception” in the name. For instance, in our
discussion of Multiple Inheritance, we encountered an AttributeError
, which
happens when you attempt to access an atribute of an object that doesn’t exist.
Language: exceptions are either raised or thrown when they are created, and
caught when they are received within code. An exception doesn’t necessarily have
to crash your program, but it often will, and should. To handle exceptions,
python uses the (very common) notion of a try block, which is created with the
keyword – wait for it – try
.
First, an uncaught exception:
def crasher():
raise RuntimeError()
crasher()
This simply wont run – it just asplodes every time. Which is, in all honesty,
not a bad thing to have happen with an exception. A very common thing to need to
do, however, is to provide some kind of output about the exception and take some
form of emergency action – exiting with an appropriate status code, for
instance. For this, we can use a try
:
def crasher():
raise RuntimeError('OH YEAH!')
def elegant_crasher():
try:
crasher()
except RuntimeError as e:
print("Oh no.")
raise e
elegant_crasher()
There’s a series of things to note here. First, we can have as many except
clauses as we like, each handling a different exception or set of exceptions –
we can also have a final catch-all that handles any exceptions we didn’t think
of. Also note that we can provide helpful error messages when we raise
exceptions – this is a very good practice indeed. Nothing ruins a day quite
like hitting some garbage like:
IncomprehensibleException: a bad is there. No I don't know where. Stop asking.
Just to give a clear example of handling Lots of Bads, we could have something like:
def foo(arg):
try:
db_conn = db.get_connection()
db.query(arg)
except ConnectionError:
print('Arg, failed to connect to the db')
return None
except ValueError, KeyError: # This'll catch either of these errors
print('DB failed to find what we need somehow for arg {}'.format(arg))
except Exception as e: # This case catches anything we haven't anticipated
print('There was a bad!')
print(e)
Now, let’s imagine that there is, in fact, no exception! In that case, our try
block skips straight over the except
clauses.
Exceptions and error-handling are very real parts of programming in most languages. And, there are better and worse ways to use them. The very worst is a thing called “control flow by exception”. The question you should ask yourself is: “am I using a try/catch block like an if/else?” If you are: stop and reconsider your choices.
Here’s a handy trick: python functions can return multiple values, which python can then “unpack” in to multiple variables.
def return_many():
return 'cat', 'dog', 'horse'
first_thing, second_thing, third_thing = return_many()
print(first_thing)
print(third_thing)
Python’s docs refer to the string formatting system as a “mini language”. This is… not great news. The docs aren’t great either. Or rather – they’re so abstruse as to be nearly useless.
So, point the first: for a handy string format reference, check out https://pyformat.info/
The string format method lets us do a lot of handy stuff. Here’s a short once-over:
print('Format fills in {} with {}'.format('curly braces', 'words'))
print('Words can be {verb} into position using {modifier} arguments; the {modifier} arguments can be repeated'.format(modifier='named or keyword', verb='put'))
print('Places can also be {0} and used as {1} args, even repeated so long as they are {0}'.format('numbered', 'positional'))
Need to print actual {}s? Escape them with a second set of {}:
print('Here are some curly braces: {{}}. Also, here is a {}'.format('cow.'))
String formatting can format damn near anything – it’s seriously ridiculously powerful. Which also means I have to always look it up. You might too. Remember: https://pyformat.info. Good stuff.
A closing example: formatting long numbers with thousands-place commas:
print('{:,}'.format(1239085830383))
wow
Context managers are a clean way of expressing this pattern:
open_file = open(path, 'r')
lines_of_file = open_file.readlines()
open_file.close()
We have some resource – a file, a database, a URL – which we want to open,
interact with, and then close. To provide for this, python provides a mechanism
called a context manager, and they are neat as heck. Context managers use the
keyword with
, and have the general form with resource_name
; optionally, you can
bind your new resource to an alias using as alias
. It looks like this:
with open(file_path, 'r') as file_handle:
lines = file_handle.readlines()
Python will handle making sure our resource is closed when execution leaves the
with
block.
We’ve seen a lot of things wrapped in “double underbars” – often written
dunderbars – go by. Dunderbars are used to denote identifiers and method names
of special significance to python itself. These methods, sometimes called “magic
methods”, are part of the neat internal glue that makes python work coherently.
Many of the magic methods, as the name suggests, are attached to classes. For
instance, __init__
is a special method that tells python how to construct a new
instance of a class.
Let’s look at the __str__
and __repr__
methods with a motivating example.
Imagine we have this class, and try to “see” it with two different kinds of
printing:
class PrintingDemo:
name = "The Printing Demo"
demo = PrintingDemo()
print(demo)
print('{!r}'.format(demo))
Blah! Both useless. When we print it, implicitly casting to string, we get the
memory address of the instance; when we try to format it using its __repr__
method, we… still just get the memory address of the instance. We can fix
this:
class PrintingDemo:
name = "The Printing Demo"
def __str__(self):
return 'Hello, my name is {name}'.format(name=self.name)
def __repr__(self):
return '<PrintingDemo name={name}>'.format(name=self.name)
demo = PrintingDemo()
print(demo)
print('{!r}'.format(demo))
Much better.
What if we want to know if two PrintingDemo
objects are the same?
class PrintingDemo:
name = "The Printing Demo"
def __str__(self):
return 'Hello, my name is {name}'.format(name=self.name)
def __repr__(self):
return '<PrintingDemo name={name}>'.format(name=self.name)
demo1 = PrintingDemo()
demo2 = PrintingDemo()
print(demo1 == demo2)
Right now, all python can do is glance at the memory address and say, “different
addresses, different objects, not equal”. We can fix it by defining the __eq__
and __ne__
methods:
class PrintingDemo:
name = "The Printing Demo"
def __str__(self):
return 'Hello, my name is {name}'.format(name=self.name)
def __repr__(self):
return '<PrintingDemo name={name}>'.format(name=self.name)
def __eq__(self, other):
return self.name == other.name
def __ne__(self, other):
return not self.__eq__(other)
demo1 = PrintingDemo()
demo2 = PrintingDemo()
print(demo1 == demo2)
Yis.
There are… a lot of magic methods. As a general rule, if you think, “how do I define <behavior> for my class”, the answer is often a magic method. For instance, here’s a very very partial list:
Method | Purpose |
---|---|
__item__ | Handles things like dict[key] retrieval |
__lt__ | “less than” operator behavior |
__gt__ | “greater than” operator behavior |
__add__ | plus operator behavior |
__and__ | Boolean and behavior |
__or__ | Boolean or behavior |
__call__ | Allows a class instance to be called as a function |
Imagine you have a directory full of code and you want to run it as a single
Thing. We can do this with a __main__.py
file, which tells python, “if this
directory gets given to you to run, here’s how to do it.” We’ve actually seen
this already, in passing, in http-demo
. It has a __main__.py
that looks like
this:
cat http-demo/__main__.py
#!/usr/bin/env python import main main.app.run()
Our __main__.py
is found by python and is executed; it in turn imports and runs
the main method of our app.
We can achieve this in scripts using an “if main” statement, which looks like this:
if __name__ == '__main__':
do_the_thing()
A statement like that at the bottom of a file tells python how to run that file. Neat!
Python has a mechanism you should know about but might not use for a while. The mechanism is called generators. Let’s consider a motivating problem.
Say you wanna count all the lines in a file that have the word “http” in them.
Our file – we’ll call it somefile.txt
– is small. The regular approach would
look like this:
path = '/path/to/somefile.txt'
with open(path, 'r') as h:
lines = h.readlines()
matching = [line for line in lines if 'http' in line]
print(len(matching))
This approach works by reading the entire file in to memory, then counting all the lines. This works just great for small files. In fact, it works great as long as the file is small enough to fit in to RAM.
Now, what if the file is 46 gigabytes? We almost certainly don’t have that much RAM. What now?
What if we could efficiently check one line at a time without ever pulling the whole file in to memory? Generators are for exactly this.
A generator is a special kind of function using the keyword yield
instead of
return
. Python sees this keyword and converts the function in to a generator. A
generator is like a list we can only read once; on every iteration, python calls
the function, retrieving the next item.
It looks like this:
path = '/path/to/somefile.txt'
def line_reader():
with open(path, 'r') as h:
yield h.readline()
matching = [line for line in line_reader() if 'http' in line]
Generators take some work to get our brains around, but they are good when data gets big.
…in fact, they are so good that they are built in to the python file API ;-P You can actually solve the above like so:
path = '/path/to/somefile.txt'
with open(path, 'r') as h:
matching = [line for line in h]
Decorators are not likely to be something you’ll use a lot any time soon – but
they come up, and you’ll see them out in the world, so you should know what they
are. (The place where you’re most likely to find them is during testing,
particularly with the py.test
library.)
First note: decorators are a design pattern you’ll see in more languages than just Python – Ruby, in particular, leaps to mind.
A decorator is an example of a higher-order function. A higher-order function
takes a function as one of its arguments. In the decorator pattern, we define a
function which we use to “decorate” some number of others, augmenting them with
some Extra Behavior. In Python, we do this by defining out decorator, and then
using an @
when we define the function it should “decorate.”
Here’s a 100% contrived example:
def call_with_5(func):
def new_func(*args, **kwargs):
new_args = args + (5,)
func(*new_args, **kwargs)
return new_func
@call_with_5
def foo(*args, **kwargs):
print(args)
print(kwargs)
@call_with_5
def bar(the_cow, *args):
print(the_cow)
if args:
print(args)
foo(arg='blerp')
bar('here is the cow')
OK so, definitely not the most useful example, but it demonstrates the machinery, which is a combination of many of the elements we’ve seen:
- We define a higher-order function, which will take the function we wanna decorate and return a new function with the new behavior.
- We use
*args
and**kwargs
, because we don’t know in advance what arguments our function will be called with – and we’d rather not care.
We can decorate any number of functions. A decorator captures the notion of wrapping an existing function in a new behavior.
Now that we’ve seen the parts, let’s consider a vastly more useful example. Say we’re writing an application, and we know there exist a set of functions so important that we want to be emailed if they have any problems. Check it out:
def email_me_if_it_breaks(func):
def responder(*args, **kwargs):
try:
func(*args, **kwargs)
except Exception as e:
email_me(e)
return responder
@email_me_if_it_breaks
def super_important_func_one():
did_it_work = do_the_super_important_thing()
if not did_it_work:
raise RuntimeError('It did not work')
else:
return did_it_work
Any function wrapped like this will email us! Woot. Woot? Woot.
Let’s say you’re writing a script that will manipulate many paths to files. You
think to yourself, “ah, I know that python has an excellent standard library”.
You find that there is a thing called os
which contains a bunch of path
utilities in a thing called path
. Good start.
Let’s get some clearer terminology. os
is a module. Within os
is another module
called path
. If we want to use it in our code, we can use the keyword import
. We
can do this a lot of different ways. Lets clarify our example like this: inside
the path
module is a function called join
, which will correctly join elements
together with slashes between them to form a valid file path, like this:
>>> path.join('/Users', 'gastove', 'Documents') '/Users/gastove/Documents'
Let’s look at all the ways we can import the join
function.
First, we can import os
and fully qualify the whole name:
import os
joined = os.path.join('/Users', 'gastove', 'Documents')
print(joined)
That’s great, but a bit clunky. We can use from ... import
syntax to bring just
the path
module in to scope:
from os import path
joined = path.join('/Users', 'gastove', 'Documents')
print(joined)
Also great. If we’re really sure we only want the join
function, we can import
only it using the same syntax:
from os.path import join
joined = join('/Users', 'gastove', 'Documents')
print(joined)
Imagine we’ve already got a function called join
, and we don’t want the names to
collide. We can alias anything we import using as
:
from os.path import join as path_join
joined = path_join('/Users', 'gastove', 'Documents')
print(joined)
Perhaps we actually want to import several things? We can do that too. As the list gets longer, it’s much easier to read if we use a set of parens and some newlines:
from os.path import (
abspath as absolute_path,
exists,
expanduser
)
Okay so: we can import things. Good! os
is part of the python standard library.
But what if we want to import code we wrote ourselves? What then?
The rules go like this:
First: if two files are in the same directory, one can import from another
If we have a file, /tmp/demo/one.py
:
def foo():
return 'foo'
And a second file, /tmp/demo/two.py
:
import one
print(one.foo())
We’re all set – nothing special need me done.
Imagine now, however, we have a directory we want to put files in,
/tmp/demo/baz/
. To be able to import from the baz
directory, we must make it in
to a module. Don’t worry! Making a module is not hard. We simply add a file
named __init__.py
to the directory that should now be importable. Our
demonstration dirs should now look like this:
tree /tmp/demo
We can now import baz
in to one.py
and two.py
.
So: we’ve got these __init__.py
files all over the place. They tell python a
module is there; what else? Do they do anything?
It turns out: yes! __init__.py
files control what Things in our modules get
exposed, and how. Imagine we have a file called song.py
in a directory called
song
, and it contains this:
class Song():
def __init__(self, lyrics, score):
"""
I am the singiest song
"""
self.lyrics = lyrics
self.score = score
def sing_a_song(song):
print(song.lyrics)
If it’s in a module with an empty __init__.py
, we would import things like this:
from song.song import sing_a_song, Song
Directory name, file name, Thing (function or class) name.
Feels a little redundant, right? We could add this to our __init__.py
:
from song import Song, sing_a_song
And now, we could do the import elsewhere like so:
from song import Song, sing_a_song
Shorter! Tidier! Also: optional. But good to know it’s there.
Imagine you have a project shaped a little like this:
tree /tmp/demo
What if we want to import code from scoot.py
into poot.py
? Python provides two
approaches you’ll encounter: relative imports and absolute imports.
Absolute imports are based on where you’ll eventually be running the code from.
That is, if we will eventually be running a command in our terminal like python
demo
, then we could think of imports as having demo
as the root, and we import
from there, like this:
# We are in poot.py
from demo.module_one import scoot
The other approach you’ll see is relative imports. These will look very much like relative file paths, because in some sense, they are:
# We are in poot.py
from ../module_one import scoot
My usual habit is this: if I’m importing one module in to the other, I use
absolute imports. If I’m importing one file within the same module in to
another, or a submodule in the same dir, I use a relative import. For instance:
if we are in module_one/__init__.py
, our imports could look like this:
import scoot
import module_three as m3
from demo.module_two import groot, poot
Let’s have another look at your friend and mine, http-demo
:
tree -L 2
This is a pretty standard python project setup. It has an unusual number of requirements.txt files – a habit of mine, because I like separating things. It’s also missing a testing dir. The truly prototypical setup would look like this:
tree -L 2
Now it has a test
dir at the correct level, and the requirements files are just
a little tidier, kept together in a dir.
[fn:6] The key of a dict
can be any hashable type. What types are hashable, you ask?
Well: any of the primitive types, as well as any class defining the __hash__
trait. Overwhelmingly, the most common thing to use as the key of a dict
is a
string. But note: we can also use a tuple
, as long as all the elements inside
are themselves hashable.
[fn:5] Multiple inheritance is an attempt to solve the same problem languages like Java solve with a technique called interfaces. Alas: interfaces are vastly superior. So it goes.
[fn:4] Note however that is-a relationships are importantly one-way – a Vehicle
is not a Car
.
[fn:3] We’ll cover the exception to this when we talk about Generators
[fn:2] We’ll cover instance methods in Classes.
[fn:1] A hash function is a function that takes an input of variable length and
produces an output of fixed length. “Hashability”, in this specific python
context, means that there is a function implemented on the tuple
type that lets
python compute a hash of that tuple, which means the tuple can be used in a
variety of special places – most importantly, places where it’s important that
python be able to tell if a thing is unique or not.