Jury duty this month, but that was more than made up for with tasty, tasty pizza – thank you Shari! February had lots of great stuff.


Google Summer of Code

We have been accepted!

Got a project to pitch? It’s not too late! Students are already rolling into irc so if you want to mentor this summer the sooner you let me know the better.


Community Council

Tag teaming with Alison on a formal proposal for our community council. Initial council attempt hit some potholes but I’m glad for it since it’s given us a much better idea of what we want the council to become.

Fingers crossed that it’s successful!


Besides this just a handful of Stem changes…

  • Patrick migrated our cryptographic descriptor validation from pycrypto (which is unmaintained) to cryptography.
  • Further sped up our tests. Unit and integ tests now run 15% faster, and @only_run_once is fixed so the ‘RUN_ALL’ target is dramatically faster.
  • Corrected unesacaping of controller responses. Thansk to this authentication cookies with non-English characters in their path now work.

See ya all in Amsterdam!

Oh for the love of… I’m having horrible luck this winter. Third bug to knock me out. From the fever guessing this one was the flu. Ok virulent antigens, you’ve had your fun with Damian. Time to leave him alone now…

None the less, this was a good month.


Tor Internal Bylaws

We finally have a ratified voting policy!

Thanks everyone! Alison and I will put together some initial proposals but first gonna take a little breather.


Stem Test Speed

During my morning commute I’ve been poking at Stem’s test performance. Faster tests mean quicker development cycles. Always a good investment.

This took some unexpected turns resulting in…

  • 22% faster integration tests.
  • 25% faster descriptor parsing when validating.
  • Our stem.util.test_tools module now provides a TimedTestRunner class that gives individual test runtimes. Sadly this is something Python’s built-in unittest module doesn’t have.

    You can run our tests with the ‘–verbose’ argument to see this information…

    verbose test output


Other neat things this month include…

Happy end to 2016! Shockingly Christmas holidays weren’t the most productive for me. Family aside a particularly juicy cold knocked me flat for a week. That said, here’s the tor things I was up to in December…


Descriptor Protocol Support

Added Stem support for new tor additions, biggest being tor’s new protocol fields. While doing this work ran into some issues where tor’s actual behavior conflicts with the spec. Now just waiting for another pair of eyes.


Tor Internal Bylaws

Progress! Taking this slow to do my best to make sure we get it right but think we’re narrowing in on a voting procedure folks are comfortable with.

Hi all! Small Nyx/Stem changes aside highlights for this month are surprisingly non-coding focused…


Stem 1.5 Release

Damn long overdue. This is a big release encompassing seventeen months of work. Check it out!

Highlights include vastly improved python 3.x performance, manual information, and fallback directory data. But this is just the tip of the iceberg.


Tor Internal Bylaws

Tried my best to ratify Alison’s community membership doc but seems that’s not to be. Pity. With our internal community I’m not optimistic we’ll ever be able to agree on a large overarching policy like it. As such trying my hand at more narrowly focused bylaws instead.

Presently working on our voting procedure. Assuming we can agree on that I’ll follow it up with a proposal to again try to fix tor-internal@. Fingers crossed it goes better than last time!

Damn this was long overdue. I’m delighted to announce Stem 1.5.2, the accumulation of seventeen months of improvements.

What is Stem, you ask? For those who aren’t familiar with it Stem is a Python library for interacting with Tor. With it you can script against your relay, descriptor data, or even write applications similar to Nyx and Vidalia.

https://stem.torproject.org/

So what’s new in this release? Short answer: a lot.


Improved Python 3.x Performance

Reading from tor’s control port 800x faster, Python 3.x users will find this release a dramatic improvement. By ‘dramatic’ I mean multiple orders of magnitude.

python3 performance difference


Tor Manual Information

Stem’s new stem.manual module provides programmatic access for Tor manual information. For example, say we want a little script that told us what our torrc options do…

from stem.manual import Manual
from stem.util import term

try:
  print("Downloading tor's manual information, please wait...")
  manual = Manual.from_remote()
  print("  done\n")
except IOError as exc:
  print("  unsuccessful (%s), using information provided with stem\n" % exc)
  manual = Manual.from_cache()  # fall back to our bundled manual information

print('Which tor configuration would you like to learn about?  (press ctrl+c to quit)\n')

try:
  while True:
    requested_option = raw_input('> ').strip()

    if requested_option:
      if requested_option in manual.config_options:
        option = manual.config_options[requested_option]
        print(term.format('%s %s' % (option.name, option.usage), term.Color.GREEN, term.Attr.BOLD))
        print(term.format(option.summary, term.Color.GREEN))  # brief description provided by stem

        print(term.format('\nFull Description:\n', term.Color.GREEN, term.Attr.BOLD))
        print(term.format(option.description + '\n', term.Color.GREEN))
      else:
        print(term.format("Sorry, we don't have any information about %s. Are you sure it's an option?" % requested_option, term.Color.RED))
except KeyboardInterrupt:
  pass  # user pressed ctrl+c

manual info demo


Fallback Directory Information

Relieving load fromt the directory authority, Stem can retrieve information about and use tor’s fallback directory mirrors

import time
from stem.descriptor.remote import DescriptorDownloader, FallbackDirectory

downloader = DescriptorDownloader()

for fallback_directory in FallbackDirectory.from_cache().values():
  start = time.time()
  downloader.get_consensus(endpoints = [(fallback_directory.address, fallback_directory.dir_port)]).run()
  print('Downloading the consensus took %0.2f from %s' % (time.time() - start, fallback_directory.fingerprint))

fallback directory fetching demo


As always this is just the tip of the iceberg. For a full rundown on the myriad of improvements and fixes in this release see our changelog.

Hi all! For the first time since I started writing these in 2010 missed a couple. Health issues knocked me out of commission but finally all sorted out. Here’s the funner stuff I was up to in August through October.


Nyx Rewrite

Done! Er… well, done-ish.

Finished rewriting the last of Nyx’s legacy codebase (menu and controller modules). For users this means we now have the tested, python 3.x compatible codebase that will be Nyx 2.0.0.

So why no release? Couple reasons…

  1. Stability. Slick as the new codebase is it introduced regressions causing non-deterministic terminal glitches. Unfortunately these are gonna be a real pita to track down…
  2. New website. This is actually pretty far along. Content is there, work left is mostly styling. My web-dev fu isn’t too strong so this may take a while.

As some of you may have noticed Stem hasn’t received a release in over a year. This isn’t due to idleness. In fact, Stem has enough changes to pack several releases.

Rather, I’ve kept delaying Stem’s 1.5.0 release over and over to coincide with Nyx’s. ‘Just one more month!’ I told myself. Ooooh, such cute optimism.

So Stem will get its release. Probably this week.


Few other noteworthy things…

  • Tor dev meeting. Thanks Jon for making it a success!
  • Finished running this year’s GSoC. Program went great, though little sad I wasn’t up for the mentor summit.
  • Stem can now download microdescriptor consensuses.
  • Helped Roger investigate relays failing to serve current consensus data.
  • Moved DocTor from Tonga to Bifroest among other monitoring updates.
  • Neat non-tor stuff. Dota Internationals, Midsummer Renaissance Festival, and vacation up to Bellingham for the Sparks Museum and Boeing Factory Tour.

Hi all! Getting in lots of family time as I sort out ongoing health issues, but none the less got some neat things to report this month!


Interpreter Panel

Part of Sambuddha’s GSoC project, half my month has gone toward reintroducing nyx’s interpreter and I’m pleased to say it’s turned out great!

Built upon Stem’s tor-prompt this expands upon the capabilities of the last arm release, providing an interactive python interpreter along with updated tor capabilities.


Curses Test Coverage

Concluding a four month overhaul, nyx now has a high degree of test coverage for its curses display capabilities which is good cuz… well, that’s kinda what nyx does. To answer the obvious question no, still no release date but this is a major milestone. Remaining work I have planned includes…

  • Refactor nyx’s menu and controller modules.
  • Put together a new site for nyx.
  • Run an open beta to solicit testing and ideas from our community.

Stem was in the works for two years before its first release. Nyx too will be ready when it’s ready.

  “Always do things right. This will gratify some people and astonish the rest.” -Mark Twain

Summer is here! Family, festivals, and other totally-not-tor things are occupying much of my time so this is gonna be short…

  • Expanded Nyx tests to include the graph and log panels. With this we have coverage of 66% of the curses components.
  • Stem support for new tor additions, highlights of this month being shared randomness and ADD_ONION basic auth.
  • GSoC midterm evaluations – everyone passed!

That’s all – now off to enjoy the sun!

Eeek I’m late! Wanted to get the PyCon trip report out first but… ok, maybe I went a tad overboard there. Oh well, better late than never. Between summer festivals and health issues tor took the back seat again this month but still some neat stuff…


PyCon

Probably wrote way too much on this already so I’ll spare ya. If you haven’t skimmed the report yet then check it out – PyCon was a neat event!


Nyx Event Selection Dialog

Remember arm’s bizarre and clumsy event selection dialog? Remember how confusing it was?

Yeah, it sucked. Sambuddha and I have been tossing pull requests back and forth all month as part of his GSoC project. This is requiring a lot of my direct involvement but oh well, the new dialog has turned out nicely!


Few other noteworthy things…

I’ve been to quite a few conferences. LinuxFest Northwest, SeaGL, PETS, Toorcamp, Defcon (prior trip report), but PyCon was particularly impressive. At over three thousand attendees with five parallel tracks of talks the word ‘busy’ hardly seems to do the conference justice.

Top TL;DR highlights for me were new capabilities in the Python 3.x series and HTTP 2.0. In particular…

  • Python 3.6 releases on Christmas, finally adding string interpolation!

    >>> name, job = 'Damian', 'software engineer'
    >>> print f'{name} is a {job}'
    Damian is a software engineer
    

  • Python 2.x support will be completely discontinued in 2020.

  • New async/await keywords in Python 3.5 provide built-in support for Twisted-style async IO.

  • Gradual type syntax in Python 3.5 makes code even more self-documenting and supportive of static analysis.

  • First major protocol update since 1999, HTTP 2.0 is now supported by all modern browsers and 60% of users in the wild. Connection multiplexing allows all site assets to be retrieved over a single connection, improving latency on the order of 50%. The new protocol also negates any need for the clever performance hacks we’ve developed over the years like asset minimization and sprite maps!

PyCon 2017 will be in Portland one more time before moving on to another venue, so if the following sounds interesting then check it out!

 


 

Serendipity is delightful. My first time taking the train, I strongly suggest Amtrak (particularly the Coast Starlight) if heading down to Portland. Comfortable, scenic, and by happy coincidence sat with Sarah Leivers: PyCon speaker with roots in the UK deaf community.

Sarah made the interesting point that even for deaf communities in English speaking countries English is often a second language. Signing is their native tongue, putting them at a disadvantage when it comes to involvement in our communities. Part of the larger ESL puzzle, our discussion was a nice reminder of why it’s important to keep documentation as linguistically simple and accessible as we can.

In the observation car the Parks Department described sights we passed, my favorite being the Centralia train station. Completed right around the time these newfangled ‘airplane’ things were taking off, to celebrate they decided to christen the building with champaign. Three bottles were loaded onto a plain and dropped. The first couple bottles missed but the third hit dead on, puncturing right through the roof.

Spoiler alert: this was the last building they christened in such a way.

Go to a conference without exploring the area and you’re doing it wrong. My train left me a few hours to explore the city, starting with the Portland Saturday Market. Easily comparable to Pike Place, the market is four city blocks jam packed with all the essentials of life: hand-carved bark houses, tie die, and of course fancy hats!

Next hit the Lan Su Chinese Garden, beautiful gem nestled into the heart of downtown…

Of course visited Ground Kontrol just a block away. Classic arcade that successfully reminded me just how much I suck at Marble Madness. In my defense haven’t played since my good old Amiga 2000…

Finally, hidden below my hotel lurked a black light pirate themed putt-putt course. So… seems that’s a thing!

With that out of the way, on to the conference!


File Descriptors, Unix Sockets and other POSIX wizardy

First talk of the first day, Christian Heimes gave a crash course on *nix file descriptors. In python descriptors are fetched with f.fileno() and Christian demoed interacting with them directly to open his cd tray.

Christian’s talk focused on file descriptor basics (which honestly I’m rustier on than I should be)…

  • Descriptors 0-2 are reserved for stdin/stdout/stderr with -1 for errors.
  • Fork clones the current process while pointing to the same global entry.
  • Exec replaces the current program, inheriting the prior descriptors (which is why pipes continue to work).
  • Descriptors can be delegated. This is useful in sandboxing situations like seccomp, allowing a broker to open files/sockets on a sandboxed process’ behalf.

Lastly Christian walked through a little strace example that illustrates how descriptors are used in a basic scenario…

% cat reader.py
with open('/home/atagar/Desktop/reader.py') as my_file:
  print(my_file.read())
% strace python reader.py
...
open("/home/atagar/Desktop/reader.py", O_RDONLY|O_LARGEFILE) = 3
read(3, "with open('/home/atagar/Desktop/"..., 4096) = 80
read(3, "", 4096)                       = 0
close(3)                                = 0
write(1, "with open('/home/atagar/Desktop/"..., 81) = 81

Refactoring Python: Why and how to restructure your code

Nice presentation by Brett Slatkin, the author of Effective Python on how and when to make code more maintainable. As developers we optimize for making things work in our first pass, and for many of us that’s where the story ends. To make code that’s truly easy to follow requires time and patience to take follow-up passes that optimize for maintainability. Something most developers don’t do.

To illustrate this Brett asked: how much of your coding time goes toward implementation? 90%? 75%? The few developers he knows that write easy to follow code only do so because they spend fully half their time refactoring anything they write. Maintainability isn’t cheap, and when faced with deadlines it’s often the first thing to go.

Brett’s other main takeaway was that without tests you’re DOA. Refactoring requires a willingness to make mistakes, and without high coverage any major overhaul of production systems is in practice impossible.

This dovetailed nicely with the following talk, Code Unto Others, which gave a few tips…

  • When it comes to maintainability remember that you don’t scale. Any rough code you write is something you’ll need to explain over and over to engineers that touches it. That’s not really how you want to spend your time, is it?
  • Commonly people can track 5-9 things at a time which is why phone numbers are seven digits. Subdivide modules to take advantage of this. As a counter-example they used Mercurial’s Repository class, a 17,000 line headache for newcomers.
  • Be wary when describing your module uses the word ‘and’ (“it does this and that”). If you need that word you’re probably doing it wrong. After reading the first half of a class you should be able to take an educated guess at what you’ll see in the second.

Finding closure with closures

Peek under the hood at how Python implements closures…

>>> def print_greeting(first_name):
...   def msg(last_name):
...     platform = os.uname()[0]
...     return "Hi %s %s, you're running %s" % (first_name, last_name, platform)
...   print(msg('Johnson'))
...   print("co_varnames: %s" % ', '.join(msg.__code__.co_varnames))
...   print("co_names: %s" % ', '.join(msg.__code__.co_names))
...   print("co_freevars: %s" % ', '.join(msg.__code__.co_freevars))
... 
>>> print_greeting('Damian')
Hi Damian Johnson, you're running Linux
co_varnames: last_name, platform
co_names: os, uname
co_freevars: first_name

varnames are local variables while freevars are variables we’re closing over from the outer scope. A gotcha that’s probably bitten every python dev is that assignment to a closed over variable overwrites it with a local…

>>> def get_score():
...   total = 0
...   def add_points():
...     total += random.randint(0, 5)
...   for i in range(3):
...     add_points()
...   return total
... 
>>> get_score()
Traceback (most recent call last):
  File "", line 1, in 
  File "", line 6, in get_score
  File "", line 4, in add_points
UnboundLocalError: local variable 'total' referenced before assignment

Python 3.x adds a new ‘nonlocal’ keyword for re-binding closures but for those of us stuck in the past our best option is to use the mutable hack. Gross, but it works.

>>> def get_score():
...   total = [0]
...   def add_points():
...     total[0] = total[0] + random.randint(0, 5)
...   for i in range(3):
...     add_points()
...   return total[0]
... 
>>> get_score()
8

What is and what can be: an exploration from ‘type’ to Metaclasses

Owww, my head. This and another talk the previous day by Mike Graham introduced audiences to the wonderful world of python metaclasses…

    “The subject of metaclasses in Python has caused hairs to raise and even brains to explode.” -Guido

Method for redefining the fundamental behavior of objects and in doing so tear the fabric of reality, metaclasses are what you invoke each time you extend object. Dustin demonstrated this by defining his own metaclass that transparently causes method invocations to be accompanied by a bark…

from functools import wraps
from inspect import isfunction

def bark(f):
  @wraps(f)
  def wrapper(*args, **kwargs):
    print("bark!")
    return f(*args, **kwargs)

  return wrapper

class MetaDog(type):
  def __new__(meta, name, bases, attrs):
    for name, attr in attrs.items():
      if isfunction(attr):
        attrs[name] = bark(attr)

    return type.__new__(meta, name, bases, attrs)

class Dog(metaclass = MetaDog):
  def sit(self):
    print("*sitting*")

  def stay(self):
    print("*sitting*")

d = Dog()
d.sit()

So why will you use this? Well… hopefully you won’t. Besides the obvious unforgivability of this sin upon your coworkers, this is the kind of black magic Ruby folks do all the time but Python devs know better. Like redefining builtins, just don’t.

That aside, it was interesting to learn a little more about the abstract base class module and how python works under the hood.


Building protocol libraries the right way

Cory Benfield, author of Requests, urllib3, and other core I/O libraries discussed a common pitfall that inflicts protocol libraries: mixture of I/O with parsing.

Python has as many HTTP parsers as there are I/O libraries. Urllib variants, aiohttp, Twisted, Tornado, and friends all reinvent this wheel. Code re-use is particularly great when you have a well defined problem with a single correct solution. Arithmetic, compression, and parsing are all examples of this, so why don’t they all share a unified parser?

The problem is that we tangle network I/O with parsing of the messages we read. As such all these projects trip over the same obscure edge cases and re-implement the same optimizations.

Cory’s message was simple: keep parsing separate. Besides code reuse this greatly improves testability because you don’t need to invoke your I/O stack for coverage.

Personally I found this talk interesting because this is exactly something I ran into with Stem. To work our I/O handler needs enough understanding of the control-spec to delimit message boundaries, but beyond that parsing is a completely separate module. This has been a great boon for testing…

TEST_MESSAGE = """\
250-version=0.2.3.11-alpha-dev
250 OK"""

def test_single_getinfo_response(self):
  """
  Parses a GETINFO reply response for a single parameter.
  """

  control_message = stem.response.ControlMessage.from_str(TEST_MESSAGE, msg_type = 'GETINFO')
  self.assertEqual({'version': b'0.2.3.11-alpha-dev'}, control_message.entries)

HTTP can do that?!

Whimsical look at lesser known bits of the HTTP specification…

  • Need just metadata of a GET request? Use HEAD instead for a far lighter response.
  • Calling OPTIONS will tell you the HTTP operations a resource supports.
  • Besides normal CRUD operations (GET, POST, PUT, DELETE) the HTTP spec has PATCH to update just part of a resource.
  • The specification also has TRACE, LINK, and UNLINK methods. Nobody uses them but hey, they’re there.
  • Few interesting headers include ETag for versioning resources, If-Modified-Since to only solicit a response if the resource has changed, and Cache-Control to define cacheability. Actually, the specification even has a From header in case you want to tell everybody in the world your email address…
  • Few standard but infrequently used response codes are…
    • 410 – That resource used to be here but now it’s gone.
    • 304 – You asked to get this resource if it’s been modified but it hasn’t.
    • 451 – Unavailable for legal reasons. Mostly comes up with censorship firewalls.
  • Unsurprisingly you can make up your own status codes and reason strings. Sumana had several amusing ones she’s found in the wild.

Playing with Python bytecode

Amusing demonstration of executing raw bytecodes in python, including runtime manipulation to switch a functor’s addition operation to multiplication. Interesting in a ‘oh god, you can do that?’ sense but even the presenters said ‘kids, don’t do this at home’. Few (if any?) practical applications, and opcodes change even between minor Python interpretor version bumps making any such hacks a maintenance nightmare.


SQLite: Gotchas and Gimmes

Tips by Dave Sawyer for SQLite, mostly focusing on the advantages over pickles (performance, safety, etc), common pitfalls, and locking strategies…

  • Deferred – Multiple readers/writers.
  • Immediate – Multiple readers/single writer
  • Exclusive – Single reader/writier.

WAL (Write Ahead Locking) is an alternative where readers are unlocked with the writer appending deltas. Upon checkpoints SQLite halts all reads/writes to apply the deltas as a batch.


See Python, See Python Go, Go Python Go

Last talk I attended and the one I wanted to see most. Imagine a world where performance critical code could be written in Go rather than C. No more memory leaks. No compilers. Sounds great, right? Well, keep dreaming.

Both Python and Go can drop to C and Andrey gave a demo of doing so as a bridge between them, and in the process explained why this is a terrible idea. The CPython Extension interface requires a bit of boilerplate but can work with no dependencies while CFFI requires some magic but provides a more portable solution. But in either case crossing both the Go-to-C and C-to-Python boundaries drop you to the least common denominator. This means no Go interfaces or routines, and no Python classes or generators.

GC, GIL, and JIT all add their own headaches but worse, you need to implement your own memory management. Sharing between Go and Python risks release of memory the other side still references. Andrey got around this by passing his own dereferenceable pointers but… ick.

In the end Andrey’s demo worked and in fact was just as performant as a direct Go implementation, but made it clear there be dragons. Frustratingly, it’s still better to just call os.system().

 


 

This being my first PyCon I focused on talks rather than the hallway track but none the less had some nice finds…

  • Seattle is home to quite a few technical meetups. Hardware hacking, TA3M, Ruby Brigade, you name it and there’s probably a group for it. SeaPig has been a fun local python group but sadly its gone dormant in recent years. Among the booths however I ran into members of PuPPy, another local python group that seems to be quite alive and well!
  • Didn’t realize in advance but AWS networking ran a booth during the job fair. Fun chats with Shawn – he has a great approach for exciting folks to apply.
  • Crossed paths with meejah several times. Together we whipped up a recipe combining our libraries so users can read stem-parsed event objects from txtorcon. Neat stuff!

Simply a great conference, I look forward to hitting PyCon again next year!