28 Aug 2016, 8:57

Word Rectangle

This is a post about a question from Cracking the Coding Interview. It’s boring, so probably go watch Stranger Things instead.


“Given a list of millions of words, design an algorithm to create the largest possible rectangle of letters such that every row forms a word (reading left to right) and every column forms a word (reading top to bottom). The words need not be chosen consecutively from the list, but all rows must be the same length and all columns must be the same height.”

Example:

spears
planet
easily
animal
relate
styles

Occasionally I’ll try a practice problem like this on a whiteboard. Usually I learn something cool or at least feel a little less like I’m going to curl up into a ball the next time I have a programming interview, but this one got super weird and interesting.

Try to come up with one of these rectangles with just the words you know, it’s like a crossword puzzle with no clues or boxes and it’s horrible. I muddled along though, and with just a couple hints from the book, I came up with something, question mark? Let’s go through it.


When you’re building up a rectangle row by row, you tend to get stuck pretty quickly with something like

noticable
potential
classroom

because npc isn’t the start of any word I can think of. If the words are stored in something called a prefix tree (these are also called Tries but I don’t know how to say that and it isn’t very descriptive), you can easily check this for any partial rectangle you’ve built up, and cry because basically nothing works. But if you’re a computer, you wouldn’t cry because that’s ridiculous, and because it means you don’t have to try quite so many rectangles, and you can move on to rectangles that still have potential.

So, let’s just assume we knew all about prefix trees and we just have some object that tells us if something is a prefix for a word in the list.

tree.isPrefix('spe')
True
tree.isPrefix('npc')
False

Now, instead of doing a search on all possible 10 x 10 word rectangles (conservatively, if you had 10 10-letter words in your language that’s already ten billion rectangles), we can “prune off” some of the rectangles before they’re even fully formed, and go right from

spears
planet
planet   # spp? I don't think so

to

spears
planet
easily

skipping right past

spears
planet
planet
planet   # a person would have stopped trying by now

After this, it’s basically brute force. I took the book’s advice and tried areas in decreasing order until finding a size that was possible, so the main loop does that and calls a function sort of like this at each step (I changed a few things to make it clearer what’s going on):

def makeRectangle(length, width, words, rect):
    if rect.width == width:
        if isFullRect(rect):
            return rect
        else:
            return None

    for word in words:
        rect.addWord(word)
        if not isPartialRect(rect):
            # Prune this one! Don't try any of its descendants.
            rect.removeWord()
            continue
        else:
            fullRect = makeRectangle(length, width, words, rect)
            if fullRect:
                # The recursive call got it!
                return fullRect
            # Or it didn't, and we remove the failed word to try the next one.
            rect.removeWord()

    # If all those fail, it's impossible.
    return None

You might recognize this as a depth-first search - it goes deep and tries everything it can with a given first word before trying with the next first word.


There’s a few elements of the solution I’m glossing over - walking down potential areas to see if any of them are possible, splitting the list of words up into the different lengths - check out my full solution here.

Once I debugged it on this very small test set

['dog',
 'emu',
 'fog',
 'def',
 'omo',
 'gug']

(I’m not good at crossword puzzles so I cheated), I put it to work on /usr/share/dict/words, a file with about 230000 words in it, which has words almost as obscure as ‘gug’ but is a good, smallish example of the type of data the question asks about.

My program chugged away for a few minutes without coming up with anything, so I ran a profiler on it to see what was taking the most time. Over the next day, I tried everything I could think of (short of rewriting the damn thing in C) to get this program to run faster (including but not limited to storing rectangles as byte arrays, storing rectangles as columns instead of rows, replacing the prefix tree with a huge hash table of strings that were valid prefixes), but it couldn’t finish the search for words much shorter than 16 letters (it’s easier the more letters you’re trying to use per word, because of how few words there are with 16+ letters).

Finally, frustrated, I turned to the solution in the book. Just reading the code, I couldn’t figure out why it would be much faster than mine, so I thought I’d clone it and try to run it myself. The solution is written in Java, which I know barely anything about, and the author recommends using Eclipse, which I know nothing about. I halfheartedly tried to bully it into running the code for a few minutes, but then I actually looked at the test that was going to run, and saw that the list of words it was going to use was just a couple thousand words long. 🤔 I ran my program on this test case, and it spat out the answer in 1.5 seconds.

Unsatisfied. I grabbed this list of 10000 most common english words, and just under two hours later I had

spears
planet
easily
animal
relate
styles

to show for it.

So, anybody else think about or solve this one? Can any Java person try it and tell me if the book solution can handle the Unix words file? I believe that the question as originally posed isn’t possible, and not just because I couldn’t do it.

26 Aug 2016, 1:04

What's new

Been working on a couple projects this summer. The big one is a research fellowship in theoretical computer science - really interesting experience. Theory research is straight-up math - seeing if you can prove statements about how fast different things can be computed, coming up with algorithms, reading complicated papers. It’s hard to tell if I’m doing a good job, but the work is interesting for sure.

This other small project is a shot-for-shot remake of Shazam or SoundHound. Not an original idea at all, but I’m learning tons. Loosely following this post, but I’m rolling everything from scratch - wrote the FFT and everything. Check it out if you’re interested in that kind of thing - I’ll probably write a technical post about it once it works.

Moved into an apartment at the beginning of the summer, so I’m balancing all the tech stuff with cooking and cleaning, and watching tons of movies. Pretty nice quiet time.

29 Dec 2015, 10:06

Raspberry Pi GPIO Projects in Swift

Warning - this is a technical post

It’s been a super good break for me here in Salt Lake.

I got a Raspberry Pi for Christmas, and I’ve been constantly working on it. Since my other recent projects are in Swift, I felt like using it for a bigger GPIO project: a morse decoder/encoder.

Installing Swift on the RPi

This was, to be honest, the most difficult part of the project. At first when I looked into this, it wasn’t possible, because the Pi has an ARM processor instead of x86. But then I this tweet popped up on my timeline, and I got back to it. The next snag was related to Raspbian. I couldn’t get the package to work on the standard Raspberry Pi linux flavor, so I started over and installed a fresh Ubuntu image. Then the installation went smoothly.

Wrapping wiringPi in Swift

The Raspberry Pi has 40 GPIO pins that can be used for interacting with the outside world - people plug in things like temperature sensors or servo motors. My brother gave me the idea of writing a morse code system with an LED and a push-button. There’s a Python library for interacting with these pins programmatically, but writing Python doesn’t feel like gaining experience anymore. wiringPi is a C library with functions for getting input and output from the pins, which is perfect for wrapping in Swift. Figuring out which .o files I needed to link against was just a matter of adding until I didn’t get anymore compiler errors, and I ended up with this simple Makefile.

Writing the code

The encoder is easy: Just write a dictionary literal of morse code characters, run the input through that, and flash the LED for the right lengths of time. The decoder is tricky, and I still haven’t finished it because of some problems with my pushbutton. I’ve settled on the basic design of writing two functions that get the length of a press or a release, and passing control back and forth from them in a main update loop until a letter or word is finished. My buttons are a little unreliable right now, so I can’t really test this part until I do a little bit more hardware work to get the input right. I’m pretty sure the code will work as soon as the button does, so for now, I’m putting the project up on Github!

03 Oct 2015, 5:23

It's up

Today my first ever iOS app hit the app store!

Shorthand Icon

Shorthand - Teachable Gesture Keyboard

Shorthand is a keyboard that lets you draw your most common phrases. You enter a phrase, then draw whatever symbol you want to represent those words, and the keyboard learns to tell your drawings apart.

For the curious, Shorthand uses an artificial neural network to “learn” the difference between the different gestures you give it. As such, it can be a little finnicky about when it does and doesn’t work. For best results, use gestures that are very different from each other: lower versus uppercase K would be really difficult for it the way it currently works, for example.

I’m excited to keep improving this and to take feedback from anyone who tries it out, this was a really fun project.

18 Sep 2015, 9:28

Debrief

I guess it’s really over now. This summer was really challenging, but I came out of it with a lot more knowledge and experience (working knowledge of three new programming languages, and a fruit fly’s olfactory system).

I think this school year’s going to be really good. Tougher classes, more responsibilities, and a 40” TV in my room for rewatching all of Breaking Bad, because stressful TV is the best stress relief.