If you use the shell extensively, I hope you know terminator by now. But, you
know, this whole “writing a blog” thing will be pretty pointless if you already
knew everything I wanted to say, so… I should probably elaborate.
Whenever I use a shell, I probably need several side by side – In one, I
compile; in a second, I check the system log; in a third, I have an SSH to a
staging server; in a fourth, I have htop showing how awesome my machine is at
compiling – etc. You know what they say about shell windows – “Bet you can’t
have just one!”. Wait… Nevermind.
terminator supports split windows, tabs, profiles, saved layouts (which I’ll
come to in a minute), etc. I use it every day.
If you look online for this issue, you’ll probably come across recommendations
for screen or tmux. These tools are useful, but require lots of key
combination memorization. I’m all in for using the keyboard, but I prefer to
have a GUI option.
One of terminator’s useful features is layouts. When you work on certain
project, you usually want the same structure of windows – one in the root of
your projects to run your build system and / or source control commands, one in
the build directory, and the last one tail -fing the system log. terminator
lets you save these structures as layouts.
This is how the terminator layout setting window looks like:
See that little “Custom command” textbox on the upper right side? terminator
will run that command in the selected window. Most probably, we’ll want to run
some shell (zsh for me) with a couple of commands specific to the window,
like setting a Python virtualenv, running a certain script, etc.
However, there are a several problems with this feature.
Continuous Commands – say you want to automatically start a continuous
command, for example tail -f /var/log/syslog. terminator will run it just
fine, but if for some reason you want to end the command (to restart it, or run
a different command in the same windows) – as soon as you press
Ctrl+C, the window will disappear.
Side Effects – a common command I like to run automatically for a given
project is to set an enviornment for it. For me, it usually means activating a
certain Python virtualenv with the workon <project> command. However, entering
that command for terminator to run, will not take effect even if I tell it to
run bash afterwards.
History – let’s say I want to automatically run some DB migration on my
Django project automatically when starting terminator. I’ll probably want to
run them again while working, so I want to be able to search backwards in my
shell history for the command (in my terminal, this happens automatically if I
start to type
and then press the “Up” key).
Commands entered here will not appear in the shell history.
To solve all of these problems, I put the following script in my ~/.zshrc (or
~/.bashrc, if you’re so inclined):
echo$INIT_CMDif[ ! -z "$INIT_CMD"]; thenOLD_IFS=$IFS setopt shwordsplit
IFS=';'# This is the Internal Field Separator, it is used in the for loop.for cmd in $INIT_CMD; doprint -s "$cmd"# Add the command the the history file.eval$cmd# Execute the command.doneunset INIT_CMD
IFS=$OLD_IFS# Restore the old IFS, so further scripts won't be affected.fi
And the following snippet goes in the “Custom command” textbox:
PEP257 is a Python Enhancement Proposal that centers around docstring conventions.
If you’ve been using Python for a while, you must have heard about PEP8. PEP8
is a Python Enhancement Proposal (PEP) that lays down conventions for how
Python code should look like. There’s also a pep8.py project that enforces the
PEP8 conventions by reading Python source files and outputting a list of errors
in your code. Well, PEP257 is a lesser known PEP that centers around docstring
Since last November, I’ve been the maintainer of pep257.py – a script that
enforces these docstring conventions in the same way that pep8.py enforces
regular coding conventions. pep257.py was authored by Vladimir Keleshev, who
was kind enough to give me the reins for the project.
On June 1st, I gave a lightning talk about PEP257 at a local Tel-Aviv meetup
called PyWeb-IL. It’s a once-a-month meetup about everything related to Python,
web development, and more.
A few days ago I got one of the best assignments I could think of at work – to
check the web out for interesting Django packages and find out if any of them
could be of use to us. It’s… pretty awesome.
So, in the last few days I’ve been reading a lot of package documentation.
My goal was to get a list of popular and recommended Django packages, briefly
review each one to get a feel of what it does and decide which ones are worth
taking a deeper look into. The “briefly” part is where my expectations met a
What I found is that most (if not all) Django package developers are pretty
responsible and provide a pretty comprehensive documentation. Thumbs up for
that! Unfortunately, they are mostly comprehensive, which is great if you’re
using the package and are looking for help with something, but not so great if
you want to just get quick information on what it does.
Let’s look at an example. There’s a Django package called
the introduction from its README file:
django-reversion is an extension to the Django web framework that
provides comprehensive version control facilities.
Roll back to any point in a model’s history – an unlimited undo facility!
Recover deleted models – never lose data again!
Admin integration for maximum usability.
Group related changes into revisions that can be rolled back in a single
Automatically save a new version whenever your model changes using Django’s
flexible signalling framework.
Automate your revision management with easy-to-use middleware.
This looks nice – except that I couldn’t for the life of me understand what
“version control” for models is. I understand that it’s probably either
keeping track of data changes in your database over time (a sort of database
backup, so you could revert it to two weeks ago, before that damn bug
occured) or keeping track of Django model changes, i.e., model schamas, like
south does. There is
nothing in this introduction to give me a hint, so let’s look a bit further.
There’s a “Getting Started” section in the docs, and that’s always helpful, so
we’ll check it out for quick examples, right?
Getting started with django-reversion
To install django-reversion, follow these steps:
django-reversion can be used to add a powerful rollback and
recovery facility to your admin site. To enable this, simply register your
models with a subclass of reversion.VersionAdmin:
Whenever you register a model with the VersionAdmin class, be sure to run
the ./manage.py createinitialrevisions command to populate the version
database with an initial set of model data. Depending on the number of rows
in your database, this command could take a while to execute.
For more information about admin integration, please read the admin
Low Level API
You can use django-reversion’s API to build powerful
version-controlled views. For more information, please read the low level API
There’s a code snippet here, which is excellent. Examples are great when you
want to know how using a certain package looks like. I could further infer that
reversion probably referes to model data and not schema from that bit about
the rows in the database – but it’s purely coincidental.
But that “Admin Integration” bit is a bit weird. It seems reasonable that
reverting to an earlier version of a certain model would be done in the Admin
page, but from this paragraph it’s not clear how this would look like and how
usable it is. Or to put it another way – pics or it didn’t
can’t I get a nice screenshot here? A solid screenshot of how the admin page
looks like would probably inform me more about what reversion does than the
many pages in this documentation.
All in all, it took me about 20 minutes of looking around the documentation to
get a feel of what it does. This includes searching google for “django
reversion screenshot” so I can see how it looks (partial luck with that search,
by the way). This is way too long.
A package documentation should server (at least) two purposes:
It should act as a “pitch” – allowing new users to quickly assess the
purpose, usage and benefits of the package.
It should act as a reference for users already using it, in case they have
questions or issues with the package.
Most packages focus on the second point, but it’s only useful if you get people
to use your software, and you get it with the pitch. So how do you do it?
There should be a specific section of your documentation dedicated to the
pitch. It should be named clearly as such – “Getting Started”, “Quick Start”,
“Tutorial” are all good, indicative names.
That section should either be in the landing page for your docs or clearly
linked from it (think bold letters).
The pitch should clearly state what the software does. Ideally, it should
also state what problems it solves. Think about drug commercials. Start with
symptoms, then the solution.
If there are different packages that do the same as yours – this is the
place to let the user know what makes your package different.
If your package has ANY graphical interface you HAVE to include a
screenshot. You wouldn’t buy a painting just from a description, would you?
You should include a minimum working example of your package. The example
should be simple, yet meaningful. If you’re declaring a class called Foo,
you’re doing it wrong. The example doesn’t only answer the question of HOW
to use your software, it should also answer WHY.
You make great software – help people use it!
Don’t take this as anything against the reversion package – I haven’t tried it and I can’t attest to its quality. The documentation is probably pretty good as reference meterial.↩
This boilerplate code is very common, very ugly and – as you’ll find out in a
minute – very avoidable. First let’s understand what were doing here, in plain
1.1. Read a block from f.
1.2. If the value was '', break from the loop.
1.3. Do something with the read value.
Why is this bad? There are two reasons:
Usually, when we iterate over an objects or until a condition happens, we
understand the scope of the loop in its first line. e.g., when reading a loop
that starts with for book in books we realize we’re iterating over all the
books. When we see a loop that starts with while not battery.empty() we
realize that the scope of the loop is for as long as we still have battery.
When we say “Do forever” (i.e., while True), it’s obvious that this scope is
a lie. So it requires us to hold that thought in our head and search the rest
of the code for a statement that’ll get us out of it. We are entering the loop
with less information and so it is less readable.
We are essentialy iterating over chunks of bytes. Out of the 4 lines in the
loop, only one line refers to those bytes. So that’s a bad signal-to-noise
ration, which also affects readability. For a reader unfamiliar with this
code-form, it’s not clear that if block == '' is a technical,
implemetation-driver detail. It might seem like an semantic value returned from
You might recall there’s a function called iter. It can accept an argument
that supports iteration and returns an iterator for it. Using it like that,
it seems pretty useless, as you just iterate over that collection without
iter. But it also accepts another argument – a sentinel
In computer programming, a sentinel value [..] is a special value
whose presence guarantees termination of a loop that processes structured
(especially sequential) data. The sentinel value makes it possible to detect
the end of the data when no other means to do so (such as an explicit size
indication) is provided. The value should be selected in such a way that it
is guaranteed to be distinct from all legal data values, since otherwise the
presence of such values would prematurely signal the end of the data.
The sentinel value in this case is an empty string – since any successful read
from an I/O device will return a non-empty string, it is guaranteed that no
successful read will return this value.
When a sentinel value is supplied to iter, it will still return an iterator,
but it will interpret its first argument differently – it will assume it’s
callable (without arguments) and will call it repeatedly until it returns the
sentinel value. Afterwards, the iterator would stop.
The trouble is that usually read functions do take an argument – usually the
size to read (in bytes, lines, etc.), so we need to create a new function which
takes no input and reads a constant size. We have two main tools for the job:
partial (imported from functools) and lambda (a built-in keyword). The
two following lines are equivalent 2:
partial is specifically designed to take functions that accept arguments and
create “smaller” functions with some of those arguments set as constants.
lambda is a little bit more flexible, but can also be useful for the same
I learned this Python tip from Raymond Hettinger’s excellent talk “Transforming code into Beautiful, Idiomatic Python”. I use his examples as well, and you should really just watch the talk instead of reading this. I’m putting this out there for two reasons: one – because writing about something helps me remember it, and two – because text is a more searchable and skimmable than video.↩
There are some minor differences between the two generated functions. Alon Horev’s blog post on the subject is a very interesting read.↩
I work on Django projects both at work and at home, in the form of side projects (I created HTMLify as a first experiment in full-stack developing and am now working on a minimalistic feed reader).
The other day I got really bummed out at work. At first, I didn’t really understand why – I was just depressed. I got talking with a co-worker and he suggested that I introspect and try to pin-point why. After a while of thinking about it, I realized what was wrong. I am embarrassed by the product I’m making.
Now, this isn’t to say we’re building a bad product, or that it doesn’t work. My problem is that it has very rough edges regarding user experience and we almost never get around to fixing these issues. Sure, if there are bugs in functionality we usually solve them first, but there is a class of “comfort” problems which aren’t really bugs and we have a lot of them.
This is partly because the backend and frontend are split between two different software groups. But I’m not blaming this on the frontend guys. The problem is the attitude that more features that are more-or-less stable are better than less features that are rock solid.
Now, I get that sometimes there are time-critical features that give value to the customer (this is a big phrase for us, as we work in Scrum), I do. But I realized that I work very differently at work from how I work at home. When I’m working on something at home and I see something that bothers me I usually fix it immediately (I, of course, finish what I’m currently working on first). At work, it’s a completely different story. When I see an “injustice”, I send an email to our product owner. He adds it to the backlog. We discuss it in meetings and prioritize it.
Whenever I “walk past” the bit of code that’s responsible, I get upset. I want to fix it NOW! The existence of this issue is like a thorn in my side. I hate it.
The Heart Wants What the Heart Wants
I feel it’s like technical debt, but it’s not exactly the same. These issues are bugs to me, but our product owner doesn’t see them this way.
I don’t really know how to deal with being bummed out about this. I’d appreciate opinions from other developers who experience similar feelings. How do you settle the need for setting and discussing issues and priorities with your need to fix things as you see them? How do you deal with getting others to feel the same? If you practice Scrum, I would very much like to hear your methods and ideas.
Feel free to give advice in the comment section below or at hackernews.
When you start to work on even rudimentary Python application, the first thing
you usually do is import some package you’re using. There are many ways to
import packages and modules – some are extremely common (found in pretty much
every Python file ever written) and some less so. In this post I will cover
different ways to import or reload modules, some conventions regarding
importing, import loops and some import easter-eggs you can find in Python.
The basic Python import. The statement import foo looks for a foo model, loads
it into memory and creates a module object called foo. How does Python knows
where to find the foo module?
When a module named spam is imported, the interpreter first searches for a
built-in module with that name. If not found, it then searches for a file
named spam.py in a list of directories given by the
variable sys.path. sys.path is initialized from these locations:
the directory containing the input script (or the current directory).
PYTHONPATH (a list of directory names, with the same syntax as the shell
If there is a bar object (which could be anything from a function to a
submodule) it can be accessed like a member: foo.bar. You can also import
several modules in one line by doing import foo, bar, but it is considered
good practice to put each import in a single line.
2. import foo.bar
This makes foo.bar available without importing other stuff from foo. The
difference from import foo is that if foo also had a baz member, it
won’t be accessible.
3. from foo import bar
This statement imports bar which could be anything that is declared in the
module. It could be a function definition, a class (albeit a
not-conventionally-named class) or even a submodule (which make foo a package).
Notice that if bar is a submodule of foo, this statement acts as if we
simply imported bar (if it was in Python’s search path). This means that a
bar object is created, and its type is 'module'. No foo object is created
in this statement.
Multiple members of foo can be imported in the same line like so:
4. from foo import bar, baz
The meaning of this is pretty intuitive: it imports both
bar and baz from the module foo. bar and baz aren’t neccessarily the
same types: baz could be a submodule and bar could be function, for that
matter. Unlike importing unrelated modules, it’s perfectly acceptable to
import everything from one module in the same line.
5. from foo import *
Sometimes foo contains so many things, that it becomes cumbersome to import
them manually. Instead, you can just import * to import them all at the same
time. Don’t do this unless you know what you’re doing! It may seem convenient
to just import * instead of specific members, but it is considered bad
practice. The reason is that you are in fact “contaminating” your global
namespace. Imagine that you do import * on a package where someone
unwittingly declared the following function:
When you do import *, this list definition will override the global,
built-in list type and you’ll get very, very unexpected errors. So it’s always
better to know exactly what you’re importing. If you’re importing too much
stuff from a certain package, you can either just suck it up or just import the
package itself (import foo) and use the foo qualifier for every use. An
interesting good use for import * is in Django settings file hierarchy. It’s
convenient there because you actually do want to manipulate the global
namespace with imported settings.
6. from foo import bar as fizz
This one is far less common than what we covered so far, but still well known.
It acts like from foo import bar, except instead of creating a bar object,
it creates a fizz module with the same meaning. There are two main reasons to
use this kind of statement: the first is when you’re importing two similarly
named objects from two different modules. You then use import as to
differentiate them, like so:
The other reason, which I’ve seen used a few times is when you import a
lone-named function (or class) and use it extensively throughout your code and
want to shorten its name.
7. from .foo import bar
Well, this escalated quickly.
This one is pretty rare and a lot of people are completely unaware of it. The
only difference in this statement is that it uses a modified search path for
modules. Namely, instead of searching the entire PYTHONPATH, it searches in
the directory where the importing file lives. So if you have two files called
fizz.py and foo.py, you can use this import in fizz, and it will import the
correct file, even if you have another foo module in your PYTHONPATH. What
is this good for? Well, sometime you create modules with generic names like
common, but you might also have a common package in the base of your
project. Instead of giving different names, you can explicitly import the one
closest to you. You can also use this method to load modules from an ancestor
in the directory tree by putting several dots. For example, from ..foo import
Foo will search one directory up, from ...foo import Foo will search two
directories up, etc.
8. foo = __import__("foo")
Ever wondered how can you import a module dynamically? This is how. Obviously
you wouldn’t use it with an explicit string, but rather with a variable of some
kind. Also notice that you have to explicitly assign the imported module to a
variable, or you won’t have access to its attributes.
This statement does exactly what it looks like. It reloads the foo module.
It’s pretty useful when you have a console open playing with a bit of code
you’re tweaking and want to continue without resetting your interpreter.
Note: If you used from foo import bar, it’s not enough to reload foo for
bar to update. You need to both reload foo and call from foo import bar
An import loop would occur in Python if you import two or more modules in a
cycle. For example, if in foo.py you would from bar import Bar and in
you from foo import Foo, you will get an import loop:
Traceback (most recent call last): File "foo.py", line 1, in <module>frombarimportBar File "/tmp/bar.py", line 1, in <module>fromfooimportFoo File "/tmp/foo.py", line 1, in <module>frombarimportBarImportError: cannot import name Bar
When this happens to you, the solution is usually to move the common objects
from foo.py and bar.py to a different file (say common.py). However,
sometime there’s actually a real loop of dependencies. For example, a method in
Bar needs to create a Foo instance and vice versa. When the dependency is
in a limited scope, you should remember that you can use the import command
wherever you want. Putting imports at the top of the file is the common
convention, but sometimes you can solve import loops by importing in a smaller
scope, like in a method definition.
What fun would an easter egg be if you didn’t try it by yourself? Try these and
Edit: Emotions were running high when I wrote this and the original
blog post that lead to this one. I realize now it was unprofessional to
publicly vent like this, so I shelved the original blog post (which was 100%
venting) and I edited this post to contain less venting and more constructive.
I hope you enjoy it.
Those of you who follow my blog may remember I recently posted about the worst
work day of my life. To get you all up to speed, the story is that I work on
the backend of a web application and the frontend guys weren’t willing to work
with our source control (Mercurial) or even operating system (Linux). When we
handled the source control for them, they aggressively accused us of ruining
their workspace (which we did, by mistake).
Well, I am happy to report that this saga has come to an end. Here’s what
The frontend guys suggested this solution to our problem (excerpt from the
The frontend guys would work on the VM we gave them. We’re still “in charge”
of it, but we’re not allowed to modify any files. When we want to share
versions of our code, we’ll use a third, public directory and copy our code
manually to and from it. The merges will be done by hand from their side, and
with Mercurial from our side.
This solution is bad. It’s bad because it’s a reinvention of the source control
idea, only done manually so that the upshot of this would be that we will spend
hours on end tinkering with diffs (removed files, especially, will be a
nuisance) instead of actually developing. My team was pretty upset the days
after this meeting. We were all frustrated with the new system we had to put up
with, and we felt it was only because of internal politics and not actual
technological reasons. We didn’t want to work with the frontend guys anymore
and we were unhappy. We wanted to escalate the issue to upper management so we
don’t have to work like this.
But our group leader (actually, his replacement) insisted that we give their
way a chance. Not because their suggestion was good, but because it wasn’t just
bad, it was painstakingly bad. It was so bad that it could not possibly work
out for anyone. And that’s was he was counting on.
So we gave it a chance. The first thing we did was to show we were doing our
part. We asked them to send us their version of the code and we merged it into
our code base. It was annoying, complicated and it took a lot of time, but we
did it without complaining (to them – we complained the hell out of it to our
GL). Then, we sent them our code base, in it’s entirety. While they are only
responsible for a small directory with some HTML/CSS/JS files, which they could
email us, our project is several hundred thousands line of code, which contains
an entire Linux framework, with C, Cython, Python and a custom build system
that also compiles the Linux kernel. They needed it all because the backend
uses this framework and they couldn’t run the server otherwise (as we explained
to them in the original meeting). So, the zip was pretty big…
And now, we waited. They got the zip and used a diff tool to do the merge.
After several hours, they started calling us for help. At day’s end, they asked
us, on a “one-time” basis, to use Mercurial to recreate their project for them.
We refused. We told them it wasn’t what we agreed to, and it’s their way we’re
trying now. The next day, they still haven’t gotten around to building the
project as they still had trouble unzipping the directory. At one point they
backed up their directory, wiped it clean and started over, and still they were
Another day went by, and we agreed to recreate their project. We sent out an
email calling for another meeting to work out a new solution, but we got the
There’s no need to set up another meeting.
We didn’t realize the size of the project we we’re dealing with. We’ll be
happy if you teach us how to use Mercurial so we can work with that from now
Wasn’t that a sight for sore eyes.
We felt victorious at last. Not only did we teach them Mercurial, they also
started using it by connecting (via NX) to their dedicated Ubuntu VM, so in a
way they’re also using Linux now. And you know what? They’re pretty happy with
the new situation. They got up to speed really fast and started pushing changes
to the code base.
The lesson to be learned here (other than the fact the source control is a
necessity), is that sometimes the best way to make someone follow your path is
to make him make his own mistakes and learn from them. I appreciate my GL very
highly for keeping his cool and resolving this issue in such a peaceful and
Despite the title, my intention is not to start a flame war. I want to discuss docstring conventions in Python, but just as a case study for conventions in general.
I suggested a while ago in my team at work that we should probably agree on
docstring conventions. Each of us had different conventions in mind, me
included. Instead of just, you know, thinking for myself, I Googled “python
docstring conventions” and lo and behold – the first result was PEP
Triple quotes are used even though the string fits on one line. This makes it easy to later expand it.
The closing quotes are on the same line as the opening quotes. This looks better for one-liners.
There’s no blank line either before or after the docstring.
The docstring is a phrase ending in a period. It prescribes the function or method’s effect as a command (“Do this”, “Return that”), not as a description; e.g. don’t write “Returns the pathname …”.
The one-line docstring should NOT be a “signature” reiterating the function/method parameters (which can be obtained by introspection).
And it goes on. Now, you may agree with some of these items and disagree with
other (personally – the “phrase as a command” I love; the “put a blank line
before the ending quotes”, not so much), but I think the value of this document
is that it simply exists and is agreed upon. Caring about how many spaces to
put before a curly brace is a phase. It’s one of the steps of teaching
yourself how to program:
Get involved in a language standardization effort. It could be the ANSI C++ committee, or it could be deciding if your local coding style will have 2 or 4 space indentation levels. Either way, you learn about what other people like in a language, how deeply they feel so, and perhaps even a little about why they feel so.
Have the good sense to get off the language standardization effort as quickly as possible.
However, caring about which standardization to use is wholly different from
caring about whether you’re complying with any. Standards are arbitrary,
yes, but they’re important. I think PEP 257 is great. Not because they get the
spacing right, but because it’s important to have an agreed upon document
(signed by the BDFL – a bonus) that does just this. So when I review other
people’s code, I correct them and I refer them to this PEP. If they complain I
say “I don’t care where we put our spaces, but we should be consistent.
Somebody already did the work of putting a document together, so why not use
In this post I’ll explain a bit about what they are and how they work (if
you’re already familiar with them, you can jump to the second part), I’ll argue
that you should always return a QuerySet object if it’s possible and I’ll
talk about how to do just that.
QuerySets Are Awesome
A QuerySet, in essence, is a list of objects of a given model. I say ‘list’
and not ‘group’ or the more formal ‘set’ because it is ordered. In fact, you’re
probably already familiar with how to get QuerySets because that’s what you
get when you call various Book.objects.XXX() methods. For example, consider
the following statement:
What all() returns is a QuerySet of Book instances which happens to
include all Book instances that exist. There are other calls which you probably
# Return all books published since 1990Book.objects.filter(year_published__gt=1990)# Return all books *not* written by Richard DawkinsBook.objects.exclude(author=''RichardDawkins'')# Return all books, ordered by author name, then# chronologically, with the newer ones first.Book.objects.order_by(''author'',''-year_published'')
The cool thing about QuerySets is that, since every one of these function both
operates on and returns a QuerySet, you can chain them up:
# Return all book published after 1990, except for# ones written by Richard Dawkins. Order them by# author name, then chronologically, with the newer # ones first.Book.objects.filter(year_published__gt=1990) \
And that’s not all! It’s also fast:
Internally, a QuerySet can be constructed, filtered, sliced, and generally
passed around without actually hitting the database. No database activity
actually occurs until you do something to evaluate the queryset.
So we’ve established that QuerySets are cool. Now what?
Return QuerySets Wherever Possible
I’ve recently worked on a django app where I had a Model that represented a
tree (the data structure, not the christmas decoration). It meant that every
instance had a link to its parent in the tree. It looked something like this:
This worked pretty well. Trouble was, I had to add another method,
get_larger_ancestors, which should return all the ancestors whose value was
larger then the value of the current node. This is how I could have implemented
The problem with this is that I’m essentially going over the list twice – one
time by django and another time by me. It got me thinking – what if
get_ancestors returned a QuerySet instead of a list? I could have done
Pretty straight forward, The important thing here is that I’m not looping over
the objects. I could perform however many filters I want on what
get_larger_ancestors returned and feel safe that I’m not rerunning on a list
of object of an unknown size. The key advantage here is that I keep using the
same interface for querying. When the user gets a bunch of objects, we don’t
know how he’ll want to slice and dice them. When we return QuerySet objects
we guarantee that the user will know how to handle it.
But how do I implement get_ancestors to return a QuerySet? That’s a little
bit trickier. It’s not possible to collect the data we want with a single
query, nor is it possible with any pre-determined number of queries. The nature
of what we’re looking for is dynamic and the alternative implementation will
look pretty similar to what it is now. Here’s the alternative, better
Take a while, soak it in. I’ll go over the specifics in just a minute.
The point I’m trying to make here is that whenever you return a bunch of
objects – you should always try to return a QuerySet instead. Doing so will
allow the user to freely filter, splice and order the result in a way that’s
easy, familiar and provides better performance.
(On a side note – I am hitting the database in get_ancestors, since I’m
using self.parent recursively. There is an extra hit on the database here –
once when executing the function and another in the future, when actually
inspecting the results. We do get the performance upside when we perform
further fliters on the results which would have meant more hits on the database
or heavy in-memory operations. The example here is to show how to turn
non-trivial operations into QuerySets).
Common QuerySet Manipulations
So, returning a QuerySet where we perform a simple query is easy. When we
want to implement something with a little more zazz, we need to perform
relational operations (and some helpers, too). Here’s a handy cheat sheet (as
an exercise, try to understand my implementation of get_larger_ancestors).
Union – The union operator for QuerySets is |, the pipe symbol. qs1 | qs2
returns a QuerySet with all the items from qs1 and all the items in qs2
while handling duplicates (items that are in both QuerySets will only appear
once in the result).
Intersection – there is no special operator for intersection, because you
already know how to do it! Chaining functions like filter and exclude are in
fact performing an intersection between the original QuerySet and the new
Difference – a difference (mathematically written as qs1 \ qs2) is all the
items in qs1 that do not exist in qs2. Note that this operation is
asymmetrical (as opposed to the previous operations). I’m afraid there is no
built-in way to do this in python, but you can do this:
Nothing – seems useless, but it actually isn’t, as the above example
shows. A lot of time, when you’re dynamically building a QuerySet with
unions, you need to start off with what would have been an empty list. This
is how to get it: MyModel.objects.none().