Dance With Grenades | Blog
I was trying to help a coworker out with a side home project. He's just learning PHP and SQL and coming up with things that are useful for his home life to develop these skills. This evening he posed a puzzle to me, to add a feature to one of these projects (a table to keep track of games of gin between him and his wife):
<Greg> in php, how can I pull out consecutive wins?
<Greg> like a winning streak
<Greg> or a losing streak for that matter
<Greg> like, "greg has won the last x games"
<Mike> can probably be done almost entirely in SQL
<Mike> lemme put that into my own DB to test
So, I got his db schema, and a couple of rows of actual data, and set about seeing if I could solve this puzzle. I have, and I spent enough time on it that I want to share it with everyone, along with an explanation of how it works. Also, I can come back and reference this later if I forget!
For following-along purposes, here is the db schema, and 12 rows of data (the first 3 are legitimate scores from his table, the other 9 I made up for testing my query):
CREATE TABLE greg_gin_db ( game_id int(11) NOT NULL AUTO_INCREMENT, greg_win int(11) NOT NULL, april_win int(11) NOT NULL, greg_total int(11) NOT NULL, april_total int(11) NOT NULL, PRIMARY KEY (game_id), KEY greg_win (greg_win,april_win) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=13 ; INSERT INTO greg_gin_db (game_id, greg_win, april_win, greg_total, april_total) VALUES (1, 1, 0, 102, 64), (2, 1, 0, 118, 88), (3, 0, 1, 79, 101), (4, 0, 1, 1, 2), (5, 0, 1, 2, 3), (6, 0, 1, 3, 4), (7, 1, 0, 2, 1), (8, 1, 0, 3, 2), (9, 1, 0, 4, 3), (10, 1, 0, 6, 4), (11, 1, 0, 5, 1), (12, 0, 1, 1, 5);
Now, the basic aim is to find streaks, roll them up into groups with a count of each streak's length, and then just grab the most recent one.
I'll do this in steps. First, we want to find the streaks. We do that like so:
SELECT greg_gin_db.*, @r := @r + (COALESCE(@greg_win, greg_win) <> greg_win) AS series, @greg_win := greg_win FROM( SELECT @r := 0, @greg_win := NULL ) vars, greg_gin_db ORDER BY game_id;
Note that you could use april_win instead of greg_win in this case. Either works just as well as the other. Anyway, this query returns the following results:
+---------+----------+-----------+------------+-------------+--------+-----------------------+ | game_id | greg_win | april_win | greg_total | april_total | series | @greg_win := greg_win | +---------+----------+-----------+------------+-------------+--------+-----------------------+ | 1 | 1 | 0 | 102 | 64 | 0 | 1 | | 2 | 1 | 0 | 118 | 88 | 0 | 1 | | 3 | 0 | 1 | 79 | 101 | 1 | 0 | | 4 | 0 | 1 | 1 | 2 | 1 | 0 | | 5 | 0 | 1 | 2 | 3 | 1 | 0 | | 6 | 0 | 1 | 3 | 4 | 1 | 0 | | 7 | 1 | 0 | 2 | 1 | 2 | 1 | | 8 | 1 | 0 | 3 | 2 | 2 | 1 | | 9 | 1 | 0 | 4 | 3 | 2 | 1 | | 10 | 1 | 0 | 6 | 4 | 2 | 1 | | 11 | 1 | 0 | 5 | 1 | 2 | 1 | | 12 | 0 | 1 | 1 | 5 | 3 | 0 | +---------+----------+-----------+------------+-------------+--------+-----------------------+ 12 rows in set (0.00 sec)
As you can see, essentially what is happening is, the query increments the new, generated column "series" by one every time greg_win is not the same as the previous row's greg_win. So from here, all we have to do is narrow the query with another outer query to get the streaks:
SELECT greg_win, april_win, COUNT(*) AS streaklen FROM( SELECT greg_gin_db.*, @r := @r + (COALESCE(@greg_win, greg_win) <> greg_win) AS series, @greg_win := greg_win FROM( SELECT @r := 0, @greg_win := NULL ) vars, greg_gin_db ORDER BY game_id) q GROUP BY series;
For the given set of data, this returns:
+----------+-----------+-----------+ | greg_win | april_win | streaklen | +----------+-----------+-----------+ | 1 | 0 | 2 | | 0 | 1 | 4 | | 1 | 0 | 5 | | 0 | 1 | 1 | +----------+-----------+-----------+ 4 rows in set (0.00 sec)
Now from here we can go two directions. To fulfill Greg's request and get the latest streak only:
SELECT greg_win, april_win, COUNT(*) AS streaklen FROM( SELECT greg_gin_db.*, @r := @r + (COALESCE(@greg_win, greg_win) <> greg_win) AS series, @greg_win := greg_win FROM( SELECT @r := 0, @greg_win := NULL ) vars, greg_gin_db ORDER BY game_id DESC) q GROUP BY series LIMIT 1;
Or, we can add yet a third layer to the puzzle and find the largest streak:
SELECT greg_win, april_win, MAX(streaklen) AS strk FROM( SELECT greg_win, april_win, COUNT(*) AS streaklen FROM( SELECT greg_gin_db.*, @r := @r + (COALESCE(@greg_win, greg_win) <> greg_win) AS series, @greg_win := greg_win FROM( SELECT @r := 0, @greg_win := NULL ) vars, greg_gin_db ORDER BY game_id) q GROUP BY series) q2;
Which gives us:
+----------+-----------+------+ | greg_win | april_win | strk | +----------+-----------+------+ | 1 | 0 | 5 | +----------+-----------+------+ 1 row in set (0.00 sec)
So there you have it. Counting consecutive rows in MySQL.
So, I've been using Subsonic for a little over a week now, so I think I have a good 'first impression' and have given it enough time to get past initial ooh's and ahh's.
Subsonic is a streaming music server. The basic idea is kinda similar to Firefly Media Server (aka mt-daapd), but there are some pretty significant differences. For one, Firefly is focused on providing a DAAP share implementation, and it does a very good job of that. Subsonic, on the other hand, streams over HTTP. It also offers highly customizable, on-the-fly transcoding. It gives you a snappy web interface, not just for administrative purposes, but also for in-browser listening, as well as playlist management. It'll even download and manage podcasts for you. And the icing on the cake is it has an excellent Android app, as well as two iPhone/iTouch apps that I haven't used because I don't have an iPhone/iTouch, which are all powered by a devilishly simple REST API. (They're asking for a 10 GBP donation for the REST API feature, though you get 30 days for free - to get you hooked, much like any good crack dealer)
My home server, HomeOne, is not a workhorse by any means. In fact, it's rather antiquated. HomeOne is currently running an Athlon XP 2200+ (Thoroughbred-B core), and sports a whopping 768MB of RAM. The music drive is 300GB, and houses around 212GB of music across approximately 40,000 tracks. Subsonic is using 169MB of RAM (resident set) and barely causes the load average to spike with 3 or 4 users streaming at once.
Setting up Subsonic is beyond simple. For Debian/Ubuntu, you just download the .deb, and install with dpkg like so:
dpkg -i subsonic-4.1.deb
This sets up Subsonic, which listens on HTTP port 4040, which you can access via your browser. If you want most common transcoders to also work, you'll then execute the following:
apt-get install lame flac faad vorbis-tools ffmpeg
Depending on how much music you have, you may want to edit /etc/defaults/subsonic. I had to increase the Java memory heap to 128MB of RAM for reliable operation with my collection. You may not need to.
At this point, you would log in via HTTP and change the admin password. At that point, the only other setting that absolutely NEEDS to be touched, is to select the folder your music is stored in. It defaults to /var/music. I keep mine in /srv/music. I set the directory, and hit save. Much to my surprise, all of the music was available to be played immediately! I was surprised. It takes WinAMP, Songbird, Foobar, et all upwards of 15 minutes to go through my entire collection on my far more up-to-date desktop and file it into their libraries. (The magic here is that Subsonic just goes by your directory structure, and then once per day updates its search index. By default, the search index is updated at 3am, but there is a setting there)
You can also set your playlist folder (by default, /var/playlists, which I kept. I did have to create it manually, though.)
At this point, you're ready to go.
However, there are other useful options which can be set. For instance, you can set a maximum bitrate that certain players are not to exceed. If a file is played that exceeds that bitrate, it is downsampled with lame (or something else if you prefer; the downsample command is easily customized). This can be useful for phone app-based players if you are worried about exceeding data use limits. If you like, it can also automatically download cover art, though at the time of this writing, there is a problem with the Discogs API key which makes that feature not work.
At this point, the search for a music collection manager is over. I'm not real happy with the subAir desktop app (though it works), and the web interface can be finnicky on Linux (Adobe's Linux Flash Player's fault, not Subsonic's, though), so I'm writing my own plugin-based player that will include the ability to interface with Subsonic. But more details on that later.
Final verdict: It needs a little polish, mainly with the broken Discogs API key, but it's probably the most stable piece of Java software I've ever used. It performs admirably well on extremely low end hardware, and outdoes pretty much every other media player out there.
Installation was a snap. All I had to do after preparing nginx was unzip piwik into its new directory, and then run through a quick 8 step wizard in-browser. This set up the MySQL database, the superuser, a first website (this one, obviously), and then logged me into a very nicely designed front page.
The front page has a number of widgets one can arrange, add, and remove at a whim. You can see visitors by day, by hour, returning visitors, how long most people stay on the site, search engine statistics for multiple search engines (Google, Yahoo!, Alexa), what browsers, operating systems, and screen resolutions people are using, where they're coming from, what continent or country their computer is in, and more. It's very featureful, but at the same time, the features seem to be accessible.
Since it's using the Google Analytics method of tracking statistics, and not scraping log files, statistics are updated in real-time. In fact, I plan to be watching it closely after posting this blog and linking to it on Facebook and Twitter just to see what it looks like. This also allows you to run Piwik on a different server than your websites since it doesn't need to read log files.
Anyway, after using it for 10 minutes, so far I would recommend this software package. Installation is as easy as it gets, it provides a wealth of accessible information, and looks nice to boot. In a week or two I'll come back and write a more in-depth Piwik review.
Alright. The downtime I had for a couple of days earlier this week was entirely my fault. I am writing out what happened so that someone else may read it and not do the same thing. Also, I will go over what I had done well, and what I am doing better this time.
Here's What Happened
Wednesday afternoon. I log into HomeOne (my local server) via SSH. I forget exactly what it was I was doing, but it was most likely just to tail Cherokee's error log since I had PHP error printing turned off. I decided it was time to do an apt upgrade. HomeOne, at this point, was running Debian Squeeze, and it was already pretty messy. As you can imagine, Squeeze is not the best choice for a production server. In retrospect, I was banking on Debian's extremism toward stability to carry me without problems on the testing branch.
Up until this point, the PostgreSQL in the Debian repositories has been 8.1. Suddenly, I see apt upgrading to 8.4. Normally, this should be fine. Unfortunately, and I'm not entirely sure why, but postgres would not upgrade from 8.1 to 8.4. Finally, I removed 8.1 and 8.4 installed just fine. This would have been just peachy, but absentmindedly, I also purged 8.1 from the system, which deleted all of my data and configs for pg 8.1.
Now, at this point, I figured (stupidly, in retrospect), "Oh, I can probably still recover this data with reiserfsck" (HomeOne had been running ReiserFS since before Hans Reiser was convicted of murder. That's how long it'd been since I originally put Debian on there.) So I remount /dev/sda1 readonly (which turned out to be my music drive) and executed the following command:
reiserfsck --rebuild-tree -S -l /root/recovery.log /dev/sda1
This ran for a while, and when it finished, I realized I'd run it on my music drive. Luckily, these static files don't change, so nothing bad seems to have happened.
So I ran it again, but this time on the right drive partition. This one was an LVM partition, so it was /dev/mapper/HomeOne-root
This drive was a lot smaller (80GB as opposed to 320 on the music volume), and was divided into multiple partitions, so HomeOne-root was 10GB. As you can imagine, this went a lot faster. When this was finished, I went to reboot the machine, only to find that the shutdown sequence couldn't complete because /etc/init.d/rcS was no longer fully intact!
At this point, I screamed.
After regaining my composure, I decided the best course of action now would be to get a backup system running on a VM on ColonelRhombus (my desktop) to provide DHCP and DNS cache, which HomeOne was no longer able to provide for the time being. Once that was all set up, I set about finding a working harddrive to put into HomeOne and install Ubuntu Server, from which to recover any recoverable data on HomeOne-root.
I wasted what seemed like several unending hours trying to get HomeOne to boot into a USB installer, since it has no optical drive, to no avail. I got it to boot to a hardy installer, but could not complete installation from there because the installer expected a cdrom device to install from, and none was available. Any attempt to make a bootable lucid usb stick after that resulted in utter failure. At this point, I went to bed.
The next morning, I dug through some of the stuff in the closet here in my office, and found an antiquated Lite-On CDRW drive, with a beige bevel. Manufactured in 2005. This thing still works? Fortunately, yes it does. Once that was plugged in, I installed Lucid to HomeOne's new pair of 30.6GB Seagate Barracuda ATA II drives (after sorting through 4 other dead PATA drives, one of which gave me false hope), shut it down, unplugged the optical drive, plugged in the other two necessary drives (HomeOne LVM and music), and booted back up. I mounted /dev/sdc1 to find that was the music partition. I made a note, and unmounted. I attempted to mount /dev/sdd's partitions, unsuccessfully. I nearly panicked, until I remembered that it was LVM. After installing lvm, they mounted just fine from /dev/mapper. Fortunately, I keep all of my web files in /srv, which I also always keep on a separate partition. They were no problem to recover. I was also able to recover my MySQL databases. One table in `modx_dwg` needed to be recovered, but it went smoothly and suffered no data loss. Unfortunately, as I had feared, none of my postgres data had been recovered.
So I finished configuring everything, finally got around to fixing the file permissions on music (and the samba configs to go along; more on that in a future post), installed nginx, php, spawn-fcgi, mysql, postgresql, etc, and got all of my sites back up and running. I do have to rebuild one important database from PostgreSQL, but luckily, it wasn't terribly complex yet anyway.
What I Am Doing Better:
Well, for one, I'm not running a testing branch of Debian anymore. HomeOne is running Ubuntu 10.04 LTS, and I won't stray from it until the next LTS at least. I'm also using more partitions for better data segregation (rather than just root and srv, this time its root, home, var, srv). Additionally, I'm setting up a backup system to backup database and web files from HomeOne to HomeOne-Backup (the aforementioned VM). With a little luck, this will make future disaster recovery a lot less disastrous. Eventually, I'll move the backup duties to a VM on a Xen host, but for now, I don't have that luxury.
The main lesson to take away from this, kiddies, is to be very careful with dpkg --purge, especially when purging database packages! Make sure you have the data you need out of any packages you're purging before purging them!