Reddit Thinks I’m a Spammer

When I published my post about else statements in Python, I got lots of comments and a score of over 400 on reddit.com/r/programming (okay, I realize I’m “bragging” again, but this actually is a part of the story). My several following posts got measly viewership although I posted them on reddit as well. I was surprised to discover they weren’t showing up on the New page at all. I thought I just wasn’t seeing it for some reason, but now I contacted a moderator and, well, here it is:

You’re not banned, but the spam filter has identified you (quite accurately) as a spammer. You have essentially only submitted your own domain, and have only commented on your own submissions. This is exactly the kind of behavior that the spam filter looks for, and accounts very much like yours get submitted to /r/reportthespammers all day long.

Please see Identifying Spammers 101 for some insight into what mods look for to identify spammers, and how you might be able to “legitimize” your activity on the site.

Well, I can’t blame the moderator as that’s their policy, I just have to wonder if they shouldn’t put some weight to the acceptance of “spam” posts by the community. Surely, I’m not posting cat photos from wesindicatecatphotos.com, am I?

Reddit thinks I’m a spammer, and in a way they’re right. It shocked me a little to discover this, as I consider myself a lawful citizen of the Internet. While I respect their need to protect their community and try to maintain members that are active in the community as a whole (as opposed to utilitarian parasites like me), I’m not sure if I’m going to try to legitimize myself for their spam filter. My main go-to technology articles site is HackerNews, and I don’t think I’ll abandon it for reddit (even though reddit is responsible for most of my viewership).

You Can’t Handle the Truth!

I got a chance to review some other people’s Python code recently, and there’s one comment I almost always have to give, which is:

if x and if x is not None are not the same!
corollary: if not x and if x is None are also quite different, obviously.

This usually happens when someone assigns None to a variable (say, x) as a sentinel value, and then x may or may not be assigned to. The test is designed to check whether or not x was assigned to, or not.

When you do if x is None, you call the operator is, which checks the identity of x. None is a singleton in Python and all None values are also the exact same instance. When you say if x, something different happens. if expects a boolean, and assuming x is not a boolean, Python automatically calls x’s __nonzero__ method. i.e., if x is actually executed as if x.__nonzero__ (or bool(x)). __nonzero__ is pretty poorly named*, but it’s a method that evaluated a class as a boolean value. It’s one of Python’s Magic Methods. The confusing thing is, that bool(None) returns False, so if x is None, if x works as you expect it to. However, there are other values that are evaluated as False. The most prominent example is an empty list. bool([]) returns False as well. Usually, an empty list has a meaning that is different to None; None means no value while an empty list means zero values. Semantically, they are different. I guess people are just unaware of the semantic difference between the two ways to write the condition.

Here are some useful snippets to demonstrate:

Testing None

>>> x = None 
... if x:
...     print 'if x'
... if x is not None:
...     print 'if x is not None'

Testing an Empty List

>>> x = [] 
... if x:
...     print 'if x'
... if x is not None:
...     print 'if x is not None'
if x is not None

Testing a Normal Value

>>> x = 42 
... if x:
...     print 'if x'
... if x is not None:
...     print 'if x is not None'
if x
if x is not None

Testing a Custom Class

>>> class Foo(object): 
...     def __nonzero__(self):
...         print 'Foo is evaluated to a boolean!'
...         return True
...
... x = Foo()
... if x:
...     print 'if x'
... if x is not None:
...     print 'if x is not None'
Foo is evaluated to a boolean!
if x
if x is not None

* Fortunately, the folks working on Python had the sense to change this to __bool__ in Python 3.x!

Gathering the Comments of the Web

My recent post about the else keyword in Python was a tremendous success (relative to my other posts), reaching 20,000+ unique visitors within about 24 hours. It was especially successful in Reddit’s /r/programming, where it got over 400 total upvotes and 130 comments. However, in the post’s own Disqus thread, there were only 3 comments (one of them was mine). Now, I get that people like to comment in the community where they came from - Reddit folks want to comment on the Reddit thread, HackerNews people want to mingle among themselves, people want to post their comments as twitter responds, etc., but as a blogger, I want people coming in from anywhere to see that there are 100+ comments on my post.

A friend of mine told me that he saw the post (which I also posted on my facebook wall) and that he didn’t know where to respond - on my wall, on my twitter account, on the Disqus thread - so many options!

It got us thinking together on how to solve this issue. Obviously we still want to maintain the separation of comment threads on their respective sites, but as a site owner / blogger, I would like my website to show all of them. So the solution is quite simple: a comment system like Disqus, which allows users to comment as usual on the thread, which is also tabbed with different communities. When you finish reading, you can see immediately how many comments there are in different communities. There was already a post I saw in HackerNews that used the HN API to display HN comments in your blog. I suggest taking it a step forward and creating a dynamic tool which works (with plugins, I imagine) for different communities.

Self Printing Programs in Python

Let’s talk about self printing programs (or, quines). A self printing program is, as is its name, self explanatory. Today I thought about how to implement a quine in Python, I whipped up a solution on my own and then posed the challenge to some people in the office. These are the results:

My Version 

s = r"print 's = r\"{0}\"'.format(s), '\n', s" 
print 's = r\"{0}\"'.format(s), '\n', s

This exemplifies a common way to implement a quine - the main problem is that you have to use some sort of function that prints. But of course, you also have to print that function, and so on. The way to deal with this is put the entire program in a string, except the assignment to that string, then print the assignment (where you can use the string itself to avoid explicitly writing it again, thus avoiding the recursion), and then the string.  Notice there’s a lot of playing with quotation marks. Next is a version that tries to solve this.

Using chr for Quotation Marks

a = "b = chr(97) + chr(32) + chr(61) + chr(32) + chr(34); b += a; print b + chr(34); print a"
b = chr(97) + chr(32) + chr(61) + chr(32) + chr(34); b += a; print b + chr(34); print a

This version (while a bit cumbersome) solves the quotation marks problem by just explicitly printing the ascii characters.

Using exec Instead of Repeating the print

s = r"print 's = r\"' + s + '\"' + '\nexec(s)'"
exec(s)

I really likes this version, as it mostly avoids repeating the code in the two lines.

The Smartass Approach

Well, the first person I introduced this challenge to, had a pretty wiseass, but overall, clever approach. He did this:

print open(__file__).read()

This works, and is pretty clever, but it obviously isn’t what quines are all about. It also wouldn’t work in an interactive shell.

The “Google” Way

After I got the above answers, I just had to google Python quines and see what comes up. A StackOverflow thread points out this (pretty cool) snippet:

_='_=%r;print _%%_';print _%_

It assigns to the variable _ a string which is contains the entire code, except for its own value which is replaced by a formatting instruction, and then print _ and feeds itself into its formatting. Looks obfuscated, but it’s pretty cool when you take a deeper look.

What else is there in Python?

We all use the else keyword in Python, usually accompanying if statement:

if x > 0: 
    print 'positive'
elif x < 0:
    print 'negative'
else:
    print 'zero'

 but Python has a few other uses to the else keyword that most people are unfamiliar with.

Read More

Cool Syntax and Weird Documentation - Fun with Scapy

I was looking for a way to parse TCP/IP packets in Python, when a friend recommended Scapy. Scapy is a nice python package that’s got a very cool interface using the “div” operator, and is used like so:

packet = IP()/TCP()/"GET / HTTP/1.0\r\n\r\n"
str(packet) # returns the packet's binary data 

which is pretty cool and creative. It makes the layers concept pretty visual. Now, I was looking for a way to parse packets, i.e., the other way around. So we were looking in scapy’s documentation. The section on “dissecting” seemed like it might be what we wanted, and here’s the introduction:

Dissecting

Layers are only list of fields, but what is the glue between each field, and after, between each layer. These are the mysteries explain in this section.

I ended up not needing to parse packets, but I did use it to generate TCP/IP packets, and I gotta say, it couldn’t be any easier. Go on, check it out. Their documentation also teaches Python.

A Late Introduction to Jools

A while back I wrote a small Java tools library called Jools. I never really presented it to “the world” (except for hosting it on Github). So I figured, now is my chance. Here are the features Jools provide:

Python-like Range Objects

This one is pretty straight forward

for (final int i : new Range(5)) { 
    players.add(new SmartPlayer(names.get(i)));
}

Generating Permutations

PermutationGenerator<String> pg = new PermutationGenerator<String>("a", "b", "c"); 
for (List<String> permutation : pg) {
    System.out.println("Permutation: " + permutation);
}

This prints:

Permutation: [a, b, c] 
Permutation: [a, c, b]
Permutation: [b, a, c]
Permutation: [b, c, a]
Permutation: [c, a, b]
Permutation: [c, b, a]

The cool thing about this class is that it also allows to get a permutation based on its index (calculated efficiently in O(n) time, when n is the number of items (not permutation)):

System.out.println("Permutation #3: " + pg.get(3));

This yields:

Permutation #3: [b, c, a]

Iterable Wrapper For Looping with Indices

Similar to Python’s enumerate function:

final List<String> list = Arrays.asList("A", "B", "C", "D", "E", "F", "G"); 
for (final IndexedElement<String> element : new Indexer<String>(list)) {
    System.out.println(element.getElement() + " : " + element.getIndex());
That’s it. Not a big library, as I said, but it’s been helpful to me, and I hope it can be helpful to you as well!

Double Iteration in List Comprehension

Here’s something I didn’t know possible in Python: iteration over more than one iterable in a list comprehension:

>>> seq_x = [1, 2, 3, 4]
>>> seq_y = 'abc'
>>> [(x,y) for x in seq_x for y in seq_y]
[(1, 'a'), 
(1, 'b'),
(1, 'c'),
(2, 'a'),
(2, 'b'),
(2, 'c'),
(3, 'a'),
(3, 'b'),
(3, 'c'),
(4, 'a'),
(4, 'b'),
(4, 'c')]

Cool, isn’t it? It’s equivalent to the following snippet:

>>> result = [] 
... for x in seq_x:
...     for y in seq_y:
...         result.append((x,y))
>>> result
[(1, 'a'),
(1, 'b'),
(1, 'c'),
(2, 'a'),
(2, 'b'),
(2, 'c'),
(3, 'a'),
(3, 'b'),
(3, 'c'),
(4, 'a'),
(4, 'b'),
(4, 'c')]

It also supports both “if” statements and referencing the outer iterator from the inner one, like so:

>>> seq = ['abc', 'def', 'g', 'hi'] 
... [y for x in seq if len(seq) > 1 for y in x if y != 'e']
['a', 'b', 'c', 'd', 'f', 'g', 'h', 'i']

This is equivalent to the snippet:

>>> result = [] 
... for x in seq:
...     if len(seq) > 1:
...         for y in x:
...             if y != 'e':
...                 result.append(y)
... result ['a', 'b', 'c', 'd', 'f', 'g', 'h', 'i']

The thing you should notice here, is that the outer loop is the first ‘for’ loop in the list comprehension. This was a little confusing for me at first because when I nest list comprehensions it’s the other way around.

To read more about this feature, check out this StackOverflow thread or the Python Manual.