Filed under Blog

Pickleball revisited

Who would have thought the world of Pickleball is so interesting it deserves two blog posts? The last time we looked at pickleball it was to determine the optimal mixed double tournament structure to maximize fairness. This time, we're going to solve for the match pairings without the gender distinction, but still as a doubles tournament. Also, we want to solve the matches so that we maximize court usage throughout the tournament. (Okay, this doesn't relate specifically to Pickleball, but whatever.)

The Problem

Put another way, how to optimally choose matchups for a doubles tournament of n people, where each person plays n-1 matches, to optimize pairing so that each person is only paired with another once, and plays against another person at most twice. Also want to solve as a tournament so we can maximize court usage (e.g., if there are 8 players, at each round we should be using two courts.)

Again, we're trying to solve this in the ballpark of n=8.

Approach

This is still an exact cover problem, but with different constraints as last time. We could still use the grid style structure, but it becomes awkward when trying to implement the court usage constraint.

Let's first generalize our exact cover solver, then restructure our "board state" data structure to implement the above contraints.

Exact Cover Problems

Last time, the heart of our constraint-propogating depth-first over-hyphen-using exact cover solver was this function:

def chooser(g, pm, n):
    if unable_to_solve(g):
        return None
    if is_solved(g, pm):
        return g
    iplace = canplace(g, pm)
    if iplace is not None:
        return chooser(*place(iplace, g[iplace], g, pm, n))
    # Have to make a choice
    ichoice = findfirst(g, lambda x: len(x) > 2)
    for v in aschoices(g[ichoice]):
        gcopy = chooser(*place(ichoice, v, deepcopy(g), deepcopy(pm), n))
        if gcopy:
            return gcopy
    return None

The arguments are the "board state", representing what choices we've made so far and what choices remain. We first check to see if the current board state is in an end condition - either unsolvable or solved. If it's neither of those, we check if there are any automatic choices to be made - choices with only one option. This is the second half of constraint propogation. If there is only one choice to make, choose that choice in the board state (which should internally enforce some constraints on the other choices), then continue.

If there aren't any freebie choices to make, we need to continue with the depth first search. Here is where we:

  1. Save the board state
  2. Try the choice and recurse
  3. If that choice didn't result in a solution, undo the choice, and try the next one

This is a similar structure to most depth first solvers. The tricky part is the undo step - because internally there could be lots of changes resulting from the constraint propogation, it's often easier to just store a copy of the board state before making the choice, then revert back to that copy if that branch didn't work out. This can be slow, however, since you are always copying chunks of memory. Also, to make sure you take advantage of constraint propogation, you need to make sure it is not taking the same code path as that copy.

Other approaches remove the need for saving the board state by allowing an undo choice operation, or "backtracking". This can result in large speed increases by reducing the memory footprint. The best example of this is Knuth's Dancing Links (DLX) technique for solving exact cover problems. Maybe we'll do a pickleball solver with DLX some other time.

So, how do we generalize this thing? It's pretty close already - we just need to implement the interface functions

  • unable_to_solve - is the board state unsolvable?
  • is_solved - is the board state solved?
  • canplace - is there an automatic choice to make?
  • findfirst - find the first position that has multiple choices
  • place - make a choice in a certain position

That sounds like a class. So, rewriting assuming a BoardState interface:

def chooser(board):
    if board.unable_to_solve():
        return None
    if board.is_solved():
        return board
    constrained_pos = board.canplace()
    if constrained_pos is not None:
        return chooser(board.place(constrained_pos, board.choices(constrained_pos)[0]))
    # Have to make a choice
    pos = board.first_position_with_choices()
    for choice in board.choices(pos):
        boardcopy = chooser(deepcopy(board).place(pos, choice))
        if boardcopy is not None:
            return boardcopy
    return None

Again, this assumes we are using the "copy" method of backtracking, instead of an explicit undo operation. Also, the "place" method contains the clever constraint propogation parts. This seems okay, since if we had less clever constraint propogation, we'd just do more depth first searching, without any changes to this code.

Pickleball Specifics

Now let's write the pickle ball board state class. As I mentioned before, the grid-style representation last time doesn't work as well when you only have players in a single category, as opposed to split into two distinctly competing groups (gender), and you also want to solve for the actual court usage during the tournament.

After thinking this over for a while, let's try representing our board state as an actual tournament bracket, like so:

Court 1    Court 2
+----+     +----+
|    |     |    |
|----|     |----|     Match 1
|    |     |    |
+----+     +----+

+----+     +----+
|    |     |    |
|----|     |----|     Match 2
|    |     |    |
+----+     +----+

+----+     +----+
|    |     |    |
|----|     |----|     Match 3
|    |     |    |
+----+     +----+

...etc.

With n=8 players, we'll need 8/4 = 2 courts and 8-1 = 7 matches for the tournament.

The goal will be to fill in each court in each match to meet all of our constraints. Namely:

  • Each player is paired on the same team with every other player exactly once
  • Each player will never play against another player more than twice

So, our finished solution will look something like:

Court 1    Court 2
+----+     +----+
| AB |     | EF |
|----|     |----|     Match 1
| CD |     | GH |
+----+     +----+

+----+     +----+
| AC |     | FG |
|----|     |----|     Match 2
| DE |     | BH |
+----+     +----+

...etc.

meaning that in Match 1, player A and player B play against player C and player D, and player E and player F play against player G and player H, and so on.

Note that we are not restricting the court and the position on the court (for now). So, for this solution, we are just trying to get the pairing for each match - which court and position each pair ends up on doesn't matter (can easily be switched at game time). Put another way, we are assuming that each position on each court are exactly the same (which is a pretty big assumption if you know that Court 1 is nearer to the door, so it's really drafty, but we have to start somewhere.)

So, without loss of generality, we can assign one player to the same position in each court for each match on initialization, and cycle through all partners:

Court 1    Court 2
+----+     +----+
| AB |     |    |
|----|     |----|     Match 1
|    |     |    |
+----+     +----+

+----+     +----+
| AC |     |    |
|----|     |----|     Match 2
|    |     |    |
+----+     +----+

+----+     +----+
| AD |     |    |
|----|     |----|     Match 3
|    |     |    |
+----+     +----+

...etc.

How are we going to represent this in code? Let's try representing the tournament as a list of matches, and each match as a list of pairs, where pairs[0] plays pairs[1] on Court 1 and pairs[2] plays pairs[3] on Court 2. Actually, each pairs entry will be a list of remaining pairs that can go in that position. Something like:

matches = [
    # Match 1
    [['AB'], ['BC', 'BD', ...],              # Choices for court 1
     ['EF', 'EH', ...], ['FG', 'FH', ...]],  # Choices for court 2
    # Match 2
    [['AC'], ['BD', 'BE', ...],              # Choices for court 1
     ['EF', 'EH', ...], ['FG', 'FH', ...]],  # Choices for court 2
    ...
    ]

Implementation

Now we're getting somewhere. So, the board state class will look something like:

PLAYER_IDS = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

class PickleBallBoardState(object):
    def __init__(self, n):
        allpairs = []
        for i in xrange(n):
            for j in xrange(i+1, n):
                allpairs.append(PLAYER_IDS[i] + PLAYER_IDS[j])
        print allpairs

        n_courts = n / 4
        positions_per_court = 2
        n_matches = n - 1
        self.matches = []
        for i_match in xrange(n_matches):
            match = []
            for i_position in xrange(n_courts * positions_per_court):
                match.append(list(allpairs))
            self.matches.append(match)

I've found that a "is_placed" data structure is helpful when doing these problems, to easily distinguish the case where a position has a single choice available, but hasn't yet had the constraints propogated. So, let's add a placed structure of boolean values with a similar match/position structure:

        self.placed = []
        for i_match in xrange(n_matches):
            self.placed = [False] * n_courts * positions_per_court

Finally, let's fill in those initial choices that prevent certain types of duplicate solutions. Meaning, AB will always be in match 1, AC in match 2, AD in match 3, and so on. We'll use the "place" function which will start removing constrained choices.

        for i_match in xrange(n_matches):
            self.place((i_match, 0), PLAYER_IDS[0] + PLAYER_IDS[i_match + 1])

Now for some of the good stuff. Let's do some easy functions first, then come back to the constraint enforcing place function.

    def unable_to_solve(self):
        return any(not choices for choices in iflatten2d(self.matches))

    def is_solved(self):
        return all(len(choices) == 1 for choices in iflatten2d(self.matches)) \
               and all(iflatten2d(self.placed))

Okay, I can't resist a little premature pythoptimization. I've used a little flatten utility that returns an iterator so we don't have to double loop through matches then positions. For the following functions, we'll actually need to do the double loop, though:

    def canplace(g):
        for i_match, match in enumerate(self.matches):
            for i_pos, choices in enumerate(match):
                if len(choices) == 1 and not self.placed[i_match][i_pos]:
                    return (i_match, i_pos)
        return None

    def first_position_with_choices(self):
        for i_match, match in enumerate(self.matches):
            for i_pos, choices in enumerate(match):
                if len(choices) > 1:
                    assert not self.placed[i_match][i_pos]
                    return (i_match, i_pos)
        return None

   def choices(self, location):
       i_match, i_pos = location
       return self.matches[i_match][i_pos]

Now it's just the place function. It should consist of setting the fixed choice for that position, then removing all the other choices due to the constraints. It gets a little complicated, though, since one of the constraints depends on the number of times one player has played another. So, lets add that as a member variable and only update it when place is called

    def _filter_out_players(self, i_match, i_pos, players):
        if not self.placed[i_match][i_pos]:
            self.matches[i_match][i_pos] = filter(
                lambda c: (c[0] not in players) and (c[1] not in players),
                self.matches[i_match][i_pos])

    def _blocked_players(self, players):
        blocked_set = set()
        for player in players:
            blocked_set.update(p for p, count in self.play_counts[player].iteritems() if count >= 2)
        return blocked_set

    def place(self, location, choice):
        placed_match, placed_pos = location
        assert not self.placed[placed_match][placed_pos]

        # First check if we are completing a versus match
        opposing_pos = placed_pos + 1 if placed_pos % 2 == 0 else placed_pos - 1
        completing_versus = self.placed[placed_match][opposing_pos]

        # Place the choice
        self.matches[placed_match][placed_pos] = [choice]
        self.placed[placed_match][placed_pos] = True

        # Remove all other instances of this pairing
        for i_match, match in enumerate(self.matches):
            for i_pos, choices in enumerate(match):
                if not self.placed[i_match][i_pos]:
                    try:
                        choices.remove(choice)
                    except ValueError:
                        pass

        # Remove all choices containing these players from this match
        for i_pos in xrange(len(self.matches[placed_match])):
            self._filter_out_players(placed_match, i_pos, choice)

        # Make sure person only plays against another player maximum twice
        if completing_versus:
            self._update_play_counts(placed_match, placed_pos)
            for i_match, match in enumerate(self.matches):
                for i_pos, choices in enumerate(match):
                    opposing_pos = i_pos + 1 if i_pos % 2 == 0 else i_pos - 1
                    if self.placed[i_match][opposing_pos] and not self.placed[i_match][i_pos]:
                        self._filter_out_players(
                            i_match, i_pos,
                            self._blocked_players(self.matches[i_match][opposing_pos][0]))

        return self

That's pretty much it. A quick little pretty printer and solver function and we're done.

    def solnstr(self):
        lines = []
        for i_match, match in enumerate(self.matches):
            lines.append(('Match %d: ' % (i_match + 1)) + \
                         ', '.join(match[ipos][0] + ' vs ' + match[ipos+1][0]
                                   for ipos in xrange(0, len(match), 2)))
        return '\n'.join(lines)

def solve(n):
    board = PickleBallBoardState(n)
    return chooser(board)

Results

Running

print solve(8).solnstr()

gives

Match 1: AB vs CD, EF vs GH
Match 2: AC vs EG, BD vs FH
Match 3: AD vs EH, BC vs FG
Match 4: AE vs BF, CG vs DH
Match 5: AF vs CH, BE vs DG
Match 6: AG vs DF, BH vs CE
Match 7: AH vs BG, CF vs DE

Great! It worked. All pairings are represented and no player plays another one more than twice.

Trying with n=12 looks to be too slow to finish quickly, and require some more cleverness. Maybe have to do that in a future post. Some quick tests imply that the deepcopy operation is pretty slow, especially with the overly-clever nested datastructures.

You can get the full source here.

Tagged , , ,

Total derivatives in sympy

This is a quick one. I'd like to use sympy to derive the the total derivative of an expression symbolically, not just a partial derivative. sympy has straightforward support for partial derivatives, but I was confused as to total derivatives.

For example, I'd like to say

difftotal(x_dot**2 + cos(x), t)

and get back

-g*x_dot*sin(x) + 2*x_ddot*x_dot

The difficulty is specifying which variables are actually functions of t, and which are constants (g, in this case). Also, you want to specify that dx/dt is the variable x_dot, which also appears in your original equation.

Here's a function to do just that:

from sympy import *

def difftotal(expr, diffby, diffmap):
    """Take the total derivative with respect to a variable.

    Example:

        theta, t, theta_dot = symbols("theta t theta_dot")
        difftotal(cos(theta), t, {theta: theta_dot})

    returns

        -theta_dot*sin(theta)
    """
    # Replace all symbols in the diffmap by a functional form
    fnexpr = expr.subs({s:s(diffby) for s in diffmap})
    # Do the differentiation
    diffexpr = diff(fnexpr, diffby)
    # Replace the Derivatives with the variables in diffmap
    derivmap = {Derivative(v(diffby), diffby):dv 
                for v,dv in diffmap.iteritems()}
    finaldiff = diffexpr.subs(derivmap)
    # Replace the functional forms with their original form
    return finaldiff.subs({s(diffby):s for s in diffmap})

Now you can say:

>>> from sympy import *
>>> x_dot, x_ddot, t, x, g =  symbols("x_dot x_ddot t x g")
>>> diffresult = difftotal(x_dot**2 + g*cos(x), t, {x: x_dot, x_dot: x_ddot})
>>> print diffresult
-g*x_dot*sin(x) + 2*x_ddot*x_dot

And to clean up:

>>> simplify(diffresult)
x_dot*(-g*sin(x) + 2*x_ddot)

A gotcha for new sympy users - the sin and cos above are from sympy! (from sympy import *) You can't use the default sin and cos with sympy.Symbol and have it work.

Tagged , , ,

Serializing python data to JSON - some edge cases

JSON seems like a great way to serialize python data objects - it's a subset of YAML, the built in json library is easy to use, it avoids the security issues of pickle, and there's nearly a one-to-one correspondence between python data and json. Nearly.

Just to be clear, I'm talking about python objects that are basically data (e.g., hierarchies of basic data types). Serializing arbitrary python objects (e.g., classes) is a larger topic.

The point of this post is that (for python 2.7):

data == json.loads(json.dumps(data))

is False a lot of the time, even for basic data types. So, json data serialization isn't as straightforward as you (meaning, I) might think. I'd like this to work with basic python data types such as list, dict, tuple, set, and numpy arrays. Okay, and namedtuple and OrderedDict, too.

Also, my goal is mainly to get back the relevant python types. I still want the exported data to be friendly enough to someone else who wants to read my JSON data, but I'm expecting them to have to do a little work to get data out my particular JSON file format (as opposed to having everything be in vanilla JSON data.)

Let's try a few cases out:

>>> import json
>>> data = [1, 6, 8, 9]
>>> data == json.loads(json.dumps(data))
True

>>> data = {"foo": 2938, "bar": 4.22, "baz": "test1"}
>>> data == json.loads(json.dumps(data))
True

>>> data = {"foo": ["nested", 4.5, "mixed", {"wow": False}], 
            "bar": {"more": ["nesting"]}}
>>> data == json.loads(json.dumps(data))
True

Great. That all worked as expected. Now let's get disappointed.

dicts with non-strings as keys

>>> data = {1: "foo", 2: "baz"}
>>> data == json.loads(json.dumps(data))
False

What? Why didn't that work, when everything was working so nicely before? Because you (meaning, I) didn't read the JSON spec closely enough.

>>> print json.loads(json.dumps(data))
{u"1": u"foo", u"2": u"baz"}

While there is a correspondence between python dicts and JSON objects, JSON objects can only have strings as attributes. So... this don't work.

tuple

>>> data = (1, 2, 3, 4)
>>> data == json.loads(json.dumps(data))
False

Why is that?

>>> print json.loads(json.dumps(data))
[1, 2, 3, 4]

Right. JSON encodes tuples as lists.

namedtuple

>>> from collections import namedtuple
>>> MyTuple = namedtuple("MyTuple", "foo baz")
>>> data = MyTuple(foo=1, baz=2)
>>> data == json.loads(json.dumps(data))
False
>>> print json.loads(json.dumps(data))
[1, 2]

Nuts.

simplejson

At this point I should probably mention simplejson, which is the externally maintained dev version of the standard json library. The main thing is that it's a drop in replacement for json, it has a c extension that adds a speed boost, as well as implements a few features that json does not. Namely something for namedtuple.

You can do a pip install simplejson, but if you are on Windows the c extension build step may not work, so get the binaries from Christoph Gohlke here.

namedtuple again

How does simplejson help us with namedtuple? The option to export namedtuples as json objects:

>>> import simplejson as json
>>> from collections import namedtuple
>>> MyTuple = namedtuple("MyTuple", "foo baz")
>>> data = MyTuple(foo=1, baz=2)
>>> print json.loads(json.dumps(data, namedtuple_as_object=True))
{"foo": 1, "baz": 2}
>>> data == json.loads(json.dumps(data, namedtuple_as_object=True))
False

So that didn't solve our problem. It retains the key/value information, but not the fact that it's a namedtuple.

numpy arrays

While this isn't a basic python data type, it's often the data that I care about.

>>> import numpy as np
>>> data = np.array([[1,2,3], [4,5,6]])
>>> print json.loads(json.dumps(data))
TypeError: array([[1, 2, 3], [4, 5, 6]]) is not JSON serializable

Okay, the standard advice is to use the numpy array .tolist() function.

>>> data = np.array([[1,2,3], [4,5,6]])
>>> print json.loads(json.dumps(data.tolist()))
[[1, 2, 3], [4, 5, 6]]
>>> data == json.loads(json.dumps(data.tolist()))
array([[ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

Wait, what happened there? Oh yeah, numpy thinks that equality check is ambiguous. This is actually a pretty complicated and subtle issue. See here and here for more information. For now, just use numpy's array_equal function (which won't work for nested data structures.)

>>> np.array_equal(data, json.loads(json.dumps(data.tolist())))
True

Okay. Not bad. But the reconstituted array is not a numpy ndarray, but a list. So, not perfect. You'd need to do:

>>> print np.array(json.loads(json.dumps(data.tolist())))

Again, lets not gloss over this equality issue. Storing numpy arrays in nested python structures and then comparing them is non-trivial. We'll need some way of dealing with that.

jsonpickle

Another aside. It looks like jsonpickle is very close to what I want - a clean enough JSON representation of python datatypes. However, the current out-of-the-box version (0.4.0) doesn't handle the non-string-keys dictionary, doesn't handle numpy arrays, doesn't handle namedtuples, and has a warning that it doesn't sanitize the JSON input. So one approach to solving this json data problem would be to add specific handlers to jsonpickle for certain objects. I'm going to take another approach, and write my own version to make sure I understand all of the issues.

Just Give Me Some Code

Gotcha.

Here's a snippet that will make list, set, tuple, namedtuple, OrderedDict, numpy ndarrays, and dicts with non-string (but still data) keys serialize to JSON, unserialize to the right type, and not abuse JSON too much. We'll use something similar to the jsonpickle format (wrapping special cases in an object with a single attribute containing a type; e.g., {"py/set": [...]}), though no guarantees that it is compatible. Heck, no guarantees that any of this works. This approach presumably can be extended to other types as well (datetime, bigint, complex, etc...), not my arbitrary "data" distinction.

from collections import namedtuple, Iterable, OrderedDict
import numpy as np
import simplejson as json

def isnamedtuple(obj):
    """Heuristic check if an object is a namedtuple."""
    return isinstance(obj, tuple) \
           and hasattr(obj, "_fields") \
           and hasattr(obj, "_asdict") \
           and callable(obj._asdict)

def serialize(data):
    if data is None or isinstance(data, (bool, int, long, float, basestring)):
        return data
    if isinstance(data, list):
        return [serialize(val) for val in data]
    if isinstance(data, OrderedDict):
        return {"py/collections.OrderedDict":
                [[serialize(k), serialize(v)] for k, v in data.iteritems()]}
    if isnamedtuple(data):
        return {"py/collections.namedtuple": {
            "type":   type(data).__name__,
            "fields": list(data._fields),
            "values": [serialize(getattr(data, f)) for f in data._fields]}}
    if isinstance(data, dict):
        if all(isinstance(k, basestring) for k in data):
            return {k: serialize(v) for k, v in data.iteritems()}
        return {"py/dict": [[serialize(k), serialize(v)] for k, v in data.iteritems()]}
    if isinstance(data, tuple):
        return {"py/tuple": [serialize(val) for val in data]}
    if isinstance(data, set):
        return {"py/set": [serialize(val) for val in data]}
    if isinstance(data, np.ndarray):
        return {"py/numpy.ndarray": {
            "values": data.tolist(),
            "dtype":  str(data.dtype)}}
    raise TypeError("Type %s not data-serializable" % type(data))

def restore(dct):
    if "py/dict" in dct:
        return dict(dct["py/dict"])
    if "py/tuple" in dct:
        return tuple(dct["py/tuple"])
    if "py/set" in dct:
        return set(dct["py/set"])
    if "py/collections.namedtuple" in dct:
        data = dct["py/collections.namedtuple"]
        return namedtuple(data["type"], data["fields"])(*data["values"])
    if "py/numpy.ndarray" in dct:
        data = dct["py/numpy.ndarray"]
        return np.array(data["values"], dtype=data["dtype"])
    if "py/collections.OrderedDict" in dct:
        return OrderedDict(dct["py/collections.OrderedDict"])
    return dct

def data_to_json(data):
    return json.dumps(serialize(data))

def json_to_data(s):
    return json.loads(s, object_hook=restore)

Use the functions data_to_json and json_to_data to do the actual serialization. Testing this out:

>>> MyTuple = namedtuple("MyTuple", "foo baz")
>>> TEST_DATA = [1, 2, 3,
                23.32987, 478.292222, -0.0002384,
                "testing",
                False,
                [4, 5, 6, [7, 8], 9],
                ("mixed", 5, "tuple"),
                {"str": 1, "str2": 2},
                {1: "str", 2: "str4", (5, 6): "str8"},
                {4, 8, 2, "string", (4, 8, 9)},
                None,
                MyTuple(foo=1, baz=2),
                OrderedDict(
                    [("my", 23), ("order", 55), ("stays", 44), ("fixed", 602)]),
                np.array([[1, 2, 3], [4, 5, 6]]),
                np.array([[1.2398, 2.4848, 3.484884], [4.10, 5.3, 6.999992]]),
               ] 
>>> print data_to_json(TEST_DATA)
[1, 2, 3, 23.32987, 478.292222, -0.0002384, "testing", false, [4, 5, 6, [7, 8], 9], {"py/tuple": ["mixed", 5, "tuple"]}, {"str2": 2, "str": 1}, {"py/dict": [[{"py/tuple": [5, 6]}, "str8"], [1, "str"], [2, "str4"]]}, {"py/set": [{"py/tuple": [4, 8, 9]}, 8, 2, 4, "string"]}, null, {"py/collections.namedtuple": {"values": [1, 2], "fields": ["foo", "baz"], "type": "MyTuple"}}, {"py/collections.OrderedDict": [["my", 23], ["order", 55], ["stays", 44], ["fixed", 602]]}, {"py/numpy.ndarray": {"dtype": "int32", "values": [[1, 2, 3], [4, 5, 6]]}}, {"py/numpy.ndarray": {"dtype": "float64", "values": [[1.2398, 2.4848, 3.484884], [4.1, 5.3, 6.999992]]}}]

That looks like what we'd expect. Now, to verify, we'd like to say

>>> TEST_DATA == json_to_data(data_to_json(TEST_DATA))

but, we run into that nested-numpy-array-equality problem. So, here's a function that walks a complex data structure and does the comparison and handles the numpy array case:

def nested_equal(v1, v2):
    """Compares two complex data structures.

    This handles the case where numpy arrays are leaf nodes.
    """
    if isinstance(v1, basestring) or isinstance(v2, basestring):
        return v1 == v2
    if isinstance(v1, np.ndarray) or isinstance(v2, np.ndarray):
        return np.array_equal(v1, v2)
    if isinstance(v1, dict) and isinstance(v2, dict):
        return nested_equal(v1.items(), v2.items())
    if isinstance(v1, Iterable) and isinstance(v2, Iterable):
        return all(nested_equal(sub1, sub2) for sub1, sub2 in zip(v1, v2))
    return v1 == v2

now we can say

>>> nested_equal(TEST_DATA, json_to_data(data_to_json(TEST_DATA)))
True

Great! You can get a full listing of the code here: serialize_json.py

Limitations

Several. I think the numpy handling code isn't all that complete. There are a range of datatypes and record fields that you can store in numpy. This just handles vanilla nd-arrays. Also, this isn't that efficient in storage (it's a text format), especially for binary data. And, it's not that efficient in memory as well, since we create an entire new structure before passing it to the json converter (as opposed to a generator-style piecemeal solution). Security is not explicitly dealt with either. While we're not allowing any arbitrary class loading (part of the reason for the "data" distinction), we don't do anything explicit for extremely long strings or data, which could bring the system down.

After going through all of this, cPickle looks pretty good, huh? If you don't care about the security or the readability parts - cPickle is very good. If you need efficiency for large numpy arrays, check out their native formats (see savez_compressed, which can even take python dictionaries as keyword arguments, so you can save non-numpy-array data.)

References

Tagged , ,

Elisp to create a pelican blog entry

This is a quick one, mainly just to test out the little bit of elisp code that I wrote to automate the creation of blog posts. Which is what I am doing now. So meta.

Here's the code:

(defvar *blog-root*   "/path/to/pelican/website/content/")
(defvar *blog-author* "Your Name")

(defun strip-nonfile-chars (s)
  (replace-regexp-in-string "[^0-9a-zA-Z_ ]" "" s))

(defun space-to-dash (s)
  (replace-regexp-in-string " " "-" s))

(defun title-to-blog-filename (title)
  (concat (downcase (space-to-dash (strip-nonfile-chars title)))
          ".md"))

(defun start-blog-entry ()
  "Creates a new markdown blog entry for pelican"
  (interactive)
  (let ((title (read-from-minibuffer "Title of new blog entry? ")))
    (find-file (concat *blog-root* (title-to-blog-filename title)))
    (insert (format (concat "Title: %s\n"
                            "Date: %s\n"
                            "Tags:\n"
                            "Category: Blog\n"
                            "Author: %s\n\n")
                    title
                    (format-time-string "%Y-%m-%d %R")
                    *blog-author*))))

Fill in the vars, bind the start-blog-entry function to your favorite key sequence, and off you go.

Tagged , , , ,

The Pickleball Problem

Here's an interesting puzzle that my Dad ran into when trying to organize a mixed doubles Pickleball tournament.

The Problem

Given a group of n women and n men playing in a mixed doubles tournament (each match is always a woman/man team versus another woman/man team), how do you structure the matches so that you find the "best" woman and the "best" man? In other words, if the men are only competing against the men for prizes, and the women only competing against the women, so how do you maximize fairness for each of the matches - not pairing up a man and woman more than once in the tournament, and not playing against another person more than once in the tournament?

We're trying to solve this in the ballpark of n = 8.

How to solve

The interesting observation is that this can be structured as an exact cover problem. If you write out the matches as a grid, where the women are the columns (1, 2, 3...) and the men are the rows (A, B, C...), each grid entry becomes a tournament match of who that team will play. For n=4:

    1  2  3  4
  +------------
A | XX .. .. ..
B | .. XX .. ..
C | .. .. XX ..
D | .. .. .. XX

The problem is to fill in the empty values in the grid with opposing teams (e.g., C3) such that all of the matches are fair. The diagonal entries are X'd out because a team will not play itself. Further, the entries need to be reflexive - if the entry for row C, column 2 is B4, then the entry for row B column 4 should be C2.

So, for the n=4 problem, the goal is to fill in the values A2, A3, A4, B1, B3, B4, C1, C2, C4, D1, D2, D3 such that each row and column only has one 'A' value, one 'B' value, one 'C' value and one 'D' value, (excepting the value of the row that you are on - row A will not have any 'A' values) and only one '1' value, one '2' value, and so on. Once you do that, you can read off all of the tournament matches for each woman (by reading the columns) or by each man (by reading the rows).

This is similar to solving sudoku, since it is also an exact cover problem. Since I already had a sudoku solver (a brute force with constraint propogation solver, inspired by Peter Norvig's python sudoku solver), I adapted it to solve this problem.

You can see the code here.

Solutions

Here are solutions up to n=9. Any more would require additional cleverness, or days of computation time.

4x4:

    1  2  3  4
  +------------
A | XX B3 C4 D2
B | D3 XX A2 C1
C | B4 D1 XX A3
D | C2 A4 B1 XX

5x5:

    1  2  3  4  5 
  + --------------
A | XX B3 C4 D5 E2
B | C5 XX A2 E1 D3
C | D2 E4 XX A3 B1
D | E3 C1 B5 XX A4
E | B4 A5 D1 C2 XX

6x6:

    1  2  3  4  5  6 
  + -----------------
A | XX B3 C4 D5 E6 F2
B | E4 XX A2 C6 F3 D1
C | F5 E1 XX A3 D2 B4
D | B6 C5 F1 XX A4 E3
E | C2 F4 D6 B1 XX A5
F | D3 A6 B5 E2 C1 XX

7x7:

    1  2  3  4  5  6  7 
  + --------------------
A | XX B1 C2 D5 E6 G3 F4
B | A2 XX F5 G1 C7 E4 D3
C | G6 A3 XX E2 F1 D7 B5
D | E3 G5 B7 XX A4 F2 C6
E | F7 C4 D1 B6 XX A5 G2
F | C5 D6 G4 A7 B3 XX E1
G | B4 E7 A6 F3 D2 C1 XX

8x8:

    1  2  3  4  5  6  7  8 
  + -----------------------
A | XX B1 C2 D3 E4 F5 G6 H7
B | A2 XX D6 E7 F8 H4 C5 G3
C | E6 A3 XX G1 B7 D8 H2 F4
D | H5 E8 A4 XX G2 B3 F1 C6
E | G8 H6 F7 A5 XX C1 B4 D2
F | D7 G4 H1 C8 A6 XX E3 B5
G | C4 D5 B8 F2 H3 A7 XX E1
H | F3 C7 G5 B6 D1 E2 A8 XX

9x9:

    1  2  3  4  5  6  7  8  9
  + --------------------------
A | XX B1 C2 D3 E4 F5 G6 H7 I8
B | A2 XX D1 E8 F7 H9 I3 C6 G4
C | G9 A3 XX I6 H2 B8 F4 E1 D5
D | B3 F8 A4 XX C9 I2 H1 G5 E7
E | C8 I7 H6 A5 XX G1 D9 B4 F2
F | I4 E9 G8 C7 A6 XX B5 D2 H3
G | E6 H4 I5 B9 D8 A7 XX F3 C1
H | D7 C5 F9 G2 I1 E3 A8 XX B6
I | H5 D6 B7 F1 G3 C4 E2 A9 XX
Tagged , , ,

Installing make on Windows with MSYS

It's time to see where MSYS and MinGW is again. There's now an installer to take care of all of the versions for you. Get it here. I don't know why this process is always so difficult - some combination of outdated documentation and changes in the packaging system. Let's see how far we get.

  • Choose to get the latest repository catalogues.
  • Get the MSYS basic system to get make.
  • Get the developers toolkit, too.
  • Install to the default location (c:\MinGW)

If you want to use all of these goodies from cmd.exe you need to update the path. (Note this isn't the recommended way of doing things - the MinGW want to avoid modifying the path and prefer you do things on a per process basis. See here for more details.) Otherwise, you can just run everything from a msys shell.

To update the path:

  • Right-click on "My Computer" and select "Properties".
  • Click Advanced -> Environment Variables.
  • I'm going to update the PATH in the User Variables section to not mess around with system-wide settings. So, in the "User Variables" box scroll down to the line that says "PATH", Edit... that value, and append ";C:\MinGW\bin;C:\MinGW\msys\1.0\bin" to the end of the current value.

Now you should be able to use things like make and gcc from the command line. Hey, that was pretty painless.

Tagged , , ,

Transparent favicons with Paint.NET

I've spent longer than I should have figuring out how to do this. Which means: post about it to make it seem like I've done some work. Hopefully this saves someone some time.

To make a transparent favicon.ico file using Paint.NET, download the IcoCur.dll plugin from Evan Olds (thank you!) and save it in the FileTypes folder in the Paint.NET directory.

Then, create a 16x16 pixel file, with a background whose alpha channel is 0. Save the file, and you can now save a .ico. I've included only the 16x16, 32 bit image to make the favicon as small as possible.

Now, the little icon thingy in the top of your browser will look nice independent of the browser chrome.

Tagged , ,

Building a website with Pelican

So you want to get a website up, but would rather mess around with getting your site hosting framework perfect rather than generating actual content? Me too.

If you want to try static website content hosting and you're a python fan, Pelican seems to be a great choice. Supports Markdown, my minimalist text format of choice. I looked at hyde thinking it would be the python alternative to Jekyll, which people seem to love, but it seemed too complicated. With Pelican, you can write a single isolated blog post in markdown and have a website. As an experienced blogger (I've been blogging for all of 10 minutes now), that's what I want.

Overview

You'll generate a site skeleton with pelican-quickstart. You'll put blogposts as separate files in yoursite/content. E.g., yoursite/content/firstpost.md. Sitewide settings are in pelicanconf.py. You'll run

make html

which will generate all the content. You'll upload the content to your host of choice with something like

make rsync_upload

During development, you'll host your website locally with

make devserver

Just go to http://localhost:8000 to see your site.

Getting Started

The pelican documentation is great. Follow it.

Themes

Pelican comes with two themes built in, but there are many more on github. Let's see how easy it is to try them out. Change to a development directory and

git clone --recursive git://github.com/getpelican/pelican-themes.git

Be sure that --recursive is there, because some of the themes are included as submodules. I got some errors on this step because the url used for the submodules are formatted differently (e.g., http vs. https vs. git@github.com) and putty complained that their keys weren't cached. SSH into the offending url to update your cache. Also, it seems that you need a github key to get all of them. So, try leaving out the --recursive if you just want it to work, but some of the themes will be empty folders (e.g., chunk iris neat pelican-mockingbird relapse svbtle).

Now, lets install all of the themes so we can try them out. Just to be quick and dirty about it, lets do this manually. Change to the pelican-themes directory and

pelican-themes -i Just-Read basic bootlex bootstrap bootstrap2 brownstone built-texts cebong dev-random dev-random2 lightweight martyalchin mnmlist notmyidea-cms-fr notmyidea-cms sneakyidea subtle syte tuxlite_tbs waterspill-en waterspill chunk iris neat pelican-mockingbird relapse svbtle

Just to check this worked

pelican-themes --list

should list everything above.

Now, to try out a theme, just add (or change) in pelicanconf.py:

THEME="svbtle"

Great. Now, you can just flick through all of the themes that other people have worked really hard to build and convince yourself you are 'designing' a website.

Deployment

It's really just copying, right? Pelican has some built in options for deploying, using rsync over ssh to save on transfer. But, I use windows because I like things difficult, and getting rsync working on windows is... difficult, especially if you are trying to do it with putty/plink. (I believe an older version of cwRsync is the closest, but then see here and here for workarounds. You can also try installing rsync with the MSYS developer tools, but it has the same plink issue.)

So, winscp has some rsync like functionality. Put this into a deploy.bat:

winscp.com /command "open username@webhost.com" "synchronize remote -delete -mirror -criteria=either -filemask=|*~;*#;*.pyc output/ /home/username/website/" "exit"

replacing all of the relevant fields. You need the 5.x version of winscp to use the filemask option.

Sweet! That's all for now.

Tagged , , ,