Python REST API Toolkits and Search

Posted on: Sun 02 December 2012

Following the Python PyPi Twitter stream reveals a number of toolkits that help you build RESTful APIs. These modules usually get you most of the way mapping models from an ORM into an API. However, where they’ve always fallen short for me is in easing the generation of search end points. Maybe I’m doing something wrong, but I need more help in 1) handling incoming query args, 2) turning them into searches against the model(s), and 3) generating a RESTful response, especially the resource URLs.

LazyWeb, make it so!

Geospatial, Time series, and Torque

Posted on: Sat 01 December 2012

Torque is a (relatively) recently announced toolkit built on top of CartoDB 2.0 for performant browser rendering of large geospatially oriented time series data:

Torque allows you to create beautiful visualizations with big temporal datasets by bundling HTML5 browser rendering technologies with a generic and efficient temporal data transfer format created using the CartoDB SQL API. Torque visualisations work on desktop and ipads, and work well on temporal datasets with hundreds of thousands or even millions of datapoints. In anticipation of the Strata Conference starting this week in London, we have prepared some examples to share. Simon Tokumine will be there, so ping us for a private demo there.

Previously, I had found CartoDB a bit daunting for a hosted self-installation, but looking at the requirements now it doesn’t seem that bad. Feels like one of those tools where if you absorb the pain and get in on the ground floor, one can build a distinct competitive advantage. Working on tablets and desktops is a nice tease.

Diggin’ On Supervisord

Posted on: Fri 30 November 2012

Finally got a chance to put supervisord into action at work. It’s very handy for organizing, launching, and managing a bunch of server processes. I’m especially liking the interactive command shell provided by supervisorctl. Plus I think it’s easily launchable from a cron @reboot action which is great for this non-root user.

Pydoop

Posted on: Thu 29 November 2012

How have I not heard of Pydoop until now?! OEmbed Link rot on URL: https://twitter.com/pypi/status/274179982736121856

Welcome to Pydoop. Pydoop is a package that provides a Python API for Hadoop MapReduce and HDFS. Pydoop has several advantages 1 over Hadoop’s built-in solutions for Python programming, i.e., Hadoop Streaming and Jython: being a CPython package, it allows you to access all standard library and third party modules, some of which may not be available for other Python implementations – e.g., NumPy; in addition, Pydoop provides a Python HDFS API which, to the best of our knowledge, is not available in other solutions.

Sans Irony

Posted on: Wed 28 November 2012

Rafe Colburn’s reference to Christy Wampole’s How to Live Without Irony finally pushed me over the edge and I had to read her opinion piece. A good screed but before one gets all self-congratulatory about not falling for hipsterism, please check out David Foster Wallace’s “E Unibus Pluram: Television and U.S. Fiction”.

Admittedly Wampole is probably quite aware of the essay, but I’m not sure how many in the target audience are. This was the most impactful of the essays I read from DFW’s collection “A Supposedly Fun Thing I’ll Never Do Again” and has stuck with me for a very long time. Pretty much convinced me of the corrosiveness of television beyond the silliness of much of the content. Mule Variations has a good look at the importance of “E Unibus Pluram”, but it’s not a bad way to “ease” into DFW, if there is such a thing.

Hipsters are one thing but good luck assailing irony within mass media.

Uptime

Posted on: Tue 27 November 2012

374 days of uptime, a bit over one year, is tricky with any machine and limited sysadmin attention. Pattin’ myself on the back, and a thumbs up for Linode. Now I can get get around to doing a distribution upgrade.

P.S. That’s the machine that hosts this here very blog.

Wayne Enterprises Chronicles: Week 12

Posted on: Mon 26 November 2012

Victory!! A week ago, I said there wasn’t much sweeter than going into the Sunday night game, knowing you had a win in hand. Well that feeling is topped by going into Sunday period, fairly comfortable you’re going to win and clinch a playoff spot.

I didn’t really plan it this way, but I had two fantasy players in each of Thanksgiving Day’s three NFL games:

Robert Griffin III lit up Dallas, and delivered 35 fantasy points.
Miles Austin, thanks for nothing
Arian Foster, got his usual 20+, 26 in fact, befitting the #1 overall pick
Shayne Graham, got the benefit of overtime for 11 points out of the kicker position
Stevan Ridley, picked up some garbage points in New England’s shellacking of the Jets, delivering 15.7 fantasy points
I learned my lesson last week about the New England DEF and they rewarded me this week with 25 in the bank

Rolled into Sunday with 113 points on the board, and 2 players left. Going into Sunday night, Aaron Rodgers would have needed a historic fantasy performance without giving up much to Randall Cobb, who I was starting. Neither did particular well, which was fine by me.

That’s three wins in a row. And if my math’s correct, I’m back in the fantasy playoffs.

Think Sports Stats

Posted on: Sun 25 November 2012

Wondering how hard it would be to build a good set of autodidact materials for learning probability and statistics similar to Think Stats and Baseball Hacks. The twist would be to use sports repositories ala Retrosheet, cleanly integrated into pandas for analysis (bonus points for iPython HTML notebook usage), and then have a follow up course on building data oriented, interactive web sites.

Might be a market for that.

Curious

Posted on: Sat 24 November 2012

Completely unscientific, coming from my mother-in-law’s basement in Palos Heights, Illinois, using speedtest.net:

iPhone 5, AT&T LTE, 8.5 Mbps down, 0.9 up
iPad 3gen, Verizon LTE, 24.2 Mbps down, 15.8 up
iPhone 5, Comcast XFinity Wi-Fi, 0.5 Mbps down, 2.5 up

The XFinity pipe might be carrying the burden of IPTV on the cable box, but doesn’t seem like it’s living up to its billing. Upside is there’s no explicit bandwidth cap.

Verizon Wow!!

Greg Linden’s Code Maven

Posted on: Sat 24 November 2012

I’ve been a long time fan of Greg Linden, going back to the Findory launch. I don’t have a direct interest in his current projects other than they might open doors to computing for folks who hadn’t considered the potential:

Code Maven lets teens learn a little about how to program, starting with basic concepts such as loops then rapidly getting into fractals, animation, physics, and games. In every lesson, all the code is there — in some cases, a complete physics engine with gravity, frame rate, friction, and other code you can modify — and it is all live Javascript, so the impact of any change is immediate. It’s a fun way to explore what programming can do.

Code Maven is a curious blend of a game and a tutorial. Like a tutorial, it’s step-by-step, and there’s not-too-big, not-too-small challenges at each step. Like a game, it’s fun, addictive, and experimentation can yield exciting (and often very cool) results. I hope you and your friends like it. Please try Code Maven, tell your friends about it, and, if you have suggestions or feedback, please e-mail me at maven@crunchzilla.com

A diversity of hackers makes the world a better place. And who knows? Maybe My Little Guy (TM) will pick up Code Monster or Code Maven someday.

Good luck Greg!

Implementing Machine Learning

Posted on: Fri 23 November 2012

Implementing Machine Learning. Now there’s a title I’d pay for, especially if it focused on modern graphical models. Programming Collective Intelligence goes part of the way on some of the basic approaches, but filling in the details on the stuff coming out of academic papers would be really helpful.

Giving Thanks 2nd Ed.

Posted on: Thu 22 November 2012

I wrote a pretty darn good “giving thanks” post last year. It’s still mostly in effect, so go read it again.

I’ll add two things for this year.

Health and Auto Insurance. You don’t need ’em til’ you need ’em, but I needed both this year. And I’ve been paying into Cadillac plans, which have now paid me back.
Our Veterans and Servicemembers. Working for a defense contractor you get an appreciation, but I’ve noticed a national uptick in public recognition as our wars wind down. I also got the rest of the story on why my did didn’t make it to the hospital for my birth. Drafted into the Army, he was escorting a casualty home.

Thanks to one and all who have, or do, help defend our country.

mysolr

Posted on: Wed 21 November 2012

Link parkin’ since I’m doing some experimentation with Solr at work: OEmbed Link rot on URL: https://twitter.com/pypi/status/263378764417871874

mysolr was born to be a fast and easy-to-use client for Apache Solr’s API and because existing Python clients didn’t fulfill these conditions.

This Just In

Posted on: Tue 20 November 2012

The Washington Wizards are still truly awful:

Because going to NBA.com or ESPN.com is too exhausting, here is your one-stop-shop for all your Washington Wizards needs. Not like you’ll need to check whether or not they won or lost any given basketball game they play, but this Twitter account is succinct, efficient, and will probably enjoy very little creativity going forward. Its lone tweet so far:
Naw.
— Wiz Win Last Night? (@DidTheWizWin) November 20, 2012

I attended last night’s game against the Indiana Pacers. As the contest started, I noted that no one on the starting five could actually shoot. Washington promptly went down by 17 points. Despite a spirited rally, the Wiz of course fell short, unable to stop David West, and unable to get a good offensive possession down the stretch.

The scary thought is that the team probably won’t be all that much better when John Wall and Nene come back. Lottery here we come!

Wayne Enterprises Chronicles: Week 11

Posted on: Mon 19 November 2012

Victory!! In fantasy football there’s not much sweeter than going into the Sunday night game (much less Monday night) knowing you already have a victory in hand. That’s the situation I found myself in and I did it basically with a QB, a WR, and a kicker.

QB: Robert Griffin III, 39.7 points
WR: Randall Cobb, 19.8 points
K: Shayne Graham, 14 points

No one else scored in double figures, including fantasy #1 Arian Foster, and his team scored 43 points! Not to mention some unfortunate early morning lineup changes. First, playing Denarius Moore instead of Eric Decker or Miles Austin. The latter two only score 8 points but that was 8more than Moore. Second, pulling the Patriots DST for the Rams. -20 on the GM.

New rule. Never play Raiders.

Have to admit I was feeling a little nervous in the fourth quarter of the 4 PM matchups. I was only ahead by about 4 points, and my opponent still had Willis McGahee in his lineup in a close game. Little did I know that McGahee was out with a knee injury or that Stevan Ridley would seal the deal with a cheap garbage time touchdown.

Smooth Operatin’

Posted on: Sun 18 November 2012

Infinitely differentiable (smooth) functions are well-suited to background tasks. Sade is such a smooth operator for data partitioning.
— Joe Hellerstein (@joe_hellerstein) November 14, 2012

Ill-fated Sade submission online at http://t.co/YTibc10A
— Joe Hellerstein (@joe_hellerstein) November 14, 2012

As a longtime Sade fan, all I can say is nicely played Professor Hellerstein (Go Bears!!), nicely played.

Wayne Enterprises Chronicles: Week 10

Posted on: Sun 18 November 2012

Victory!! Back in the win column. My team did it despite a craptaculous performance from my wide receivers. Eric Decker and Miles Austin combined for a whopping 6.7 points.

Thankfully, Tony Gonzalez (Go Bears!!) was in my starting lineup for approximately 30 fantasy points. Running backs Arian Foster and Stevan Ridley showed up as well. Basically half of my team carried the load.

Frankly though, it’s pretty sad when you’re sweating it out on Monday night, hoping the kicker does just enough to keep the spread. Especially since my opponent had the Pittsburgh defense against Kansas City, a situation ripe for defensive scores.

A comfortable 17 point win all told and the league’s second position firmly in hand with four games to go.

Git Conflict Resolution Tutorial

Posted on: Sat 17 November 2012

Link parkin’: Git & GitHub Foundations • Conflict Resolution Editor OEmbed Link rot on URL: http://www.youtube.com/watch?feature=plcp&v=xfh13LcgqIU

This is one aspect of git that I don’t have a good handle on. Hoping this YouTube tutorial from GitHub can clarify things. Also looks like there a few other YouTube tutorials on the topic.

Souping Up Ye Olde MacBook

Posted on: Fri 16 November 2012

Despite the unveiling of the 13-inch Retina MacBook, I’m still interested in extending the life of my good ’ole white MacBook. Partially just because I’d like to see how far I can push it, partially because I’m not really home hacking enough to justify the purchase, and mostly cause I’m currently in a thrifty phase. An article from ArsTechnica captures where I’m at:

Given that everything else is already maxed out, we wanted to upgrade to an SSD in order to squeeze a bit more life out of this laptop. If you have an older computer that already has the maximum RAM but still uses a hard drive, and you’re not in the market for a new computer anytime soon, you can give it a boost with an SSD upgrade.

Prices have come down significantly since the article posted last year. You can get 512 Gb of SSD storage for 400 bucks. On top of that you can replace the optical drive with a 1 Tb HDD for a total of $150. So for less than a Mac Mini, or 25% of the dream MacBook, I could probably get at least another year to year and a half out of my MacBook. Nice!

Too bad you can’t processor upgrade these things.

Marco & The Jambox

Posted on: Thu 15 November 2012

When I was in the AT & T store ordering my new phone, I saw an actual Jambox (affiliate link warning) for real. Looked much smaller than I thought, but I wondered if it was any good.

Marco Arment has an answer:

The original Jambox, while it’s a delightful product otherwise, has two major flaws for this use. It vibrates so much with bassy songs at high volumes that it can easily vibrate itself off the edge of whatever it’s sitting on. (Fortunately, it’s also very durable.) And while it’s impressively loud for its size, it can’t get loud enough for spoken podcasts to be heard consistently clearly in a noisy shower. …

I can also recommend the original Jambox, but with hesitation: it’s only the better choice if the price difference or portability are most important to you.

For my particular use cases, that’s actually a pretty good endorsement.

Rob Pike Provocateur

Posted on: Wed 14 November 2012

I’m old enough to remember when Plan 9 was new, shiny, and really interesting. Rob Pike, one of Plan 9’s creators, takes advantage of his profile at The Setup to lob a few hand grenades in the direction of “cloud computing”:

A bunch of Macs at home, Macs and Linux at work, plus of course the Google compute clusters. When I was on Plan 9, everything was connected and uniform. Now everything isn’t connected, just connected to the cloud, which isn’t the same thing. And uniform? Far from it, except in mediocrity. This is 2012 and we’re still stitching together little microcomputers with HTTPS and ssh and calling it revolutionary. I sorely miss the unified system view of the world we had at Bell Labs, and the way things are going that seems unlikely to come back any time soon.

I most enjoyed how this kicked the anthill over at Hacker News.

Good ’Ole SQS

Posted on: Tue 13 November 2012

Jeff Barr writes regarding some new additions to SQS:

We announced the Simple Queue Service (SQS) eight years ago, give or take a day. Although this was our first infrastructure web service, we launched it with little fanfare and gave no hint that this was just the first of many such services on the drawing board. I’m sure that some people looked at it and said “Huh, that’s odd. Why is my online retailer trying to sell me a message queuing service?” Given that we are, as Jeff Bezos has said, “willing to be misunderstood for long periods of time,” we didn’t see the need to say any more.

Eight years?! Why it seems just like yesterday that I was being somewhat prescient and errr, overly effusive about Amazon’s first bit of Infrastructure as a Service.

RethinkDB

Posted on: Mon 12 November 2012

Link parkin’: RethinkDB

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

Could this be the document oriented DB that MongoDB could (should?) have been?

NBA GameTime Plus

Posted on: Sun 11 November 2012

After my great experience with streaming audio in Major League Baseball’s AtBat 2012, I started venturing out to see what the other pro sports made available. The National Hockey League has nuked itself with it’s lockout, but that still leaves the National Football League and National Basketball Association.

On both fronts the news has turned out well. The NFL charged me $24.99 for a full season of access to streaming audio for every game through their NFL 2012 iPad app. Probably a tad overpriced, but NFL Audio Pass is still worth it to me. The crappy thing is that the iPhone version of the app can’t do streaming audio. WTF NFL?! MLB has this down cold, so there can’t really be any technical challenges.

The NBA experience started out poorly, but so for looks promising. If you go to their site, the most obvious product is NBA Audio League Pass. This is Flash based streaming audio, which is free, but basically only works on desktops and Android devices. Not helpful to me.

However, the NBA Game Time 2012 iOS App has an in-app purchase of streaming audio for $9.99. The NBA site doesn’t exactly do a great job of publicizing this, but if it works well it’s a pretty good product. Essentially the same price as AtBat 2012 with just a smidge less polish on the app. I’ve got it running on my Jesusphone 5 streaming the Lakers at the moment.

Sigh. So now it turns out that the NBA audio only works on phones, contra the NFL. What gives guys? It’s the same damn OS and streaming audio is pretty mature.

curlish

Posted on: Sat 10 November 2012

Link parkin’: curlish

curl with flames on top

Ever had to speak to an OAuth 2.0 protected resource for debugging purposes? curl is a nice tool, but it totally lacks helpers for dealing with oauth.

curlish comes for the rescue. It is able to remember access tokens for you and inject it into requests. Facebook comes preconfigured so you can start using it right away.

TIL TMDB

Posted on: Fri 09 November 2012

Today I Learned about The Movie Database:

OEmbed Link rot on URL: https://twitter.com/pypi/statuses/266869622580588544

Looks like they have an extensive API.

Diggin’ On Elementary

Posted on: Fri 09 November 2012

Broke out Boogie Down Production’s Criminal Minded on the Jesusphone 5 today. Forgot how enjoyable the track Elementary was:

DJ Scott LaRock and I: KRS-One
Our mother’s first son and no, we’ll never run
From complex situations like you T-O-Y-S’s
Always talkin junk, yet in jail, you’re rockin dresses
I have arrived for the purpose of joy
Unlike any ordinary Bronx b-boy
I will volunteer my services and launch an attack
On you fake educators with your yakety-yak

Great beat too.

Wayne Enterprises Chronicles: Week 9

Posted on: Thu 08 November 2012

Yet again, the agony of Defeat.

This one stings because a I had great projection of 120+ points for my lineup versus my opponent looking at 83.

Key factors:

RGII didn’t get in the end zone.
I finally got McFaddened. Run-DMC gave me two points before bowing out in the first half with an injury.
Arian Foster had a workman like 17+ points. His counterpart on the opposition, Adrian Peterson went for 32.
No one else on my roster got in the end zone.

The only bright spot was Eric Decker who way overperformed with 25 points.

Just one of those weeks. Unfortunately, now I feel like my team is listing at 5 and 4 instead of cruising at 6 and 3. Still holding second place in the league though.

Time Zone Hate

Posted on: Wed 07 November 2012

OEmbed Link rot on URL: https://twitter.com/joshsusser/status/257725572275376128

What he said. And JavaScript datetime strings too. Taking cross-browser incompatibility Beyond The Browser.

sphinxit

Posted on: Tue 06 November 2012

OEmbed Link rot on URL: https://twitter.com/pypi/status/265710020476428289

I’m intrigued because I still have a soft spot for the Sphinx search engine, but the 0.2 release of sphinxit is not for me. Hard to kick the tires without documentation. But I’m rooting for the dev. Looks like an SQLAlchemy for Sphinx queries.

Few, Signals, & Noise

Posted on: Mon 05 November 2012

A while ago I finished Nate Silver’s new book (Amazon Affiliate link warning) currently enjoying a bit of attention at the end of this year’s political season. I enjoyed it, although I got a little too much poker, but haven’t had the time or urgency to put together a respectable write up. Thankfully, the estimable Stephen Few has done my work for me. I won’t steal his closing graf which is the summary I wish I would have written. Go read all of his post. But here’s a teaser:

I found a kindred spirit when I recently read Nate Silver’s new book The Signal and the Noise (Penguin Press, 2012). I want to give you a sense of the book and it’s powerful message by sharing a few excerpts from the introduction.

Less Chapped

Posted on: Sun 04 November 2012

After a teardown looks like I shouldn’t be all that irritated about my limited window of smug superiority:

It should come as no surprise that Apple’s fourth-generation iPad, officially released today, is barely discernible from the third-generation iPad released earlier this year. Even the teardown experts at iFixit had a hard time telling the two devices apart after cracking open their aluminum and glass shells. The real takeaway is that third-generation iPad owners shouldn’t feel like they are stuck with yesterday’s tech.

Welcome Back NBA

Posted on: Sat 03 November 2012

With any luck, I will be unleashing this post while sitting in the Verizon Center during the Washington Wizards 2012-13 home opener against the Boston Celtics. After watching bits of a few games during opening week, here are a few observations:

The Wizards are still a bunch of piece parts, and I’m not sure John Wall will make all that much difference. At least they’re a bit more professional and likable than last year.
Based upon how close they played the San Antonio Spurs, the New Orleans Hornets might be quite surprising this year. They’ve got four rookies, Anthony Davis, Al-Farouq Aminu, Austin Rivers, and Darius Miller, that look like they can play. Then there’s a core of good, e.g. ~~Brian~~ Ryan Anderson (Go Bears!) but not great veterans as support. Wouldn’t be surprised to see them challenge for the last playoff spot in the West.
The Los Angeles Clippers, yes Clippers, have a hell of a bench.
James Harden may go off for a while as the #1 guy for the Houston Rockets, but then the Association will get a book on him. Let’s see what happens then.
The Knicks are still stuck in neutral, and somehow they feel even older than last year.
Can’t wait for DRose to get back: OEmbed Link rot on URL: http://www.youtube.com/watch?v=dtj-D8HT9BY&w=400&h=225

Dadgum PostGIS!!

Posted on: Fri 02 November 2012

I’ve been enjoying the power of PostGIS at work, although it confounds me to no end. Given the amount of data I’m trying to query against, typically upwards of 10s of millions of rows, I haven‘t found writing efficient spatial queries to be straightforward. This week provided an opportunity to develop a hypothesis about why.

On one machine, I have my spatial DB on a traditional spinning disk HD drive. A query I wrote took about 6 hours. I’ve taken up the radical experiment (for our research org, we move hella fast ;-/) putting the same DB on a consumer grade Solidstate Storage Disk to see what would happen. Query time dropped to about 10 minutes. My back of the envelope calculation shows a 36x improvement. Caveat this with the understanding that I have in no way conducted a scientific comparison. Apples to oranges and all that.

Still the query went from doable to damn useful. Why? My guess is that spatial data and indexing are hard to lay out for good sequential access. Random disk seeks wind up being the order of the day. Thus, the advantages of SSDs really start to shine.

Just a hunch, but I really need to conduct some deeper investigation. And maybe attend some local geo meetups to commiserate with fellow travelers.

Wayne Enterprises Chronicles: Week 8

Posted on: Thu 01 November 2012

Victory! Back to back wins to cross over into the second half of the season. Moves me comfortably into second place in the league.

I got a lot of nice contributions from every position except TE. RGIII was bottled up by the Pittsburgh defense but still managed double figure fantasy points. Tony Gonzalez (Go Bears!) had his first stinker fantasy game with only 4.4 points.

The deficit against projection was made up at RB, starting Darren McFadden and Stevan Ridley, along with the WR spots of Miles Austin and Eric Decker. Ridley and and Decker were the big wins, both being second tier players but coming in at +6. Add in my kicker, Sebastian Janikowski, at +6 as well and I actually came in +5 against my projection.

My opponent put me on pins and needles Monday night, with the San Francisco DEF as his last player. They would have needed to score 18 points to wreck my night, but it wasn‘t out of the realm of possibility given the strength of the 49er D and the crappiness of the Arizona QB.

But it all played out well, and ended in a seven point victory.

On another note, just more evidence of the emergence of computational journalism. Yahoo! has outsourced automating game summaries for fantasy football matches (!!). And the results aren‘t all that bad.

Computational journalism cribbed from Irfan Essa.

Real-Time and Big Data

Posted on: Wed 31 October 2012

Check out our new piece: “Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture” http://t.co/KBzzvM2q
— Jimmy Lin (@lintool) October 31, 2012

Preprint paper from the guys at Twitter: “Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture”

We present the architecture behind Twitter’s real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time “twist”: after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of “big data”.

Via @lintool

C’Ya Sandy

Posted on: Tue 30 October 2012

Bye bye Hurricane Sandy! You got our little development here in Leesburg for two hours of power outage, and our townhouse for a little roof leakage, but otherwise you weren’t too bad. You didn’t even get after my Linode up in the Newark datacenter.

I was highly impressed with your wind gusts though.

A Bad Way To Go Out

Posted on: Mon 29 October 2012

I know I’m a bit late to the party, but poor Ozzie Guillen got let go in what may be an MLB first. The Marlin’s posted his firing on Twitter:

BREAKING NEWS: The Marlins announced today that they have dismissed manager Ozzie Guillen. pic.twitter.com/hJdDhVE8
— Miami Marlins (@Marlins) October 23, 2012

Yikes! Amongst Chicagoans, there will always be a soft spot for Oz though, thanks to that 2005 World Series. But I had a feeling it would end badly, from the day he signed on.

P.S. More evidence that it’s all just media now. Now need to use that anachronistic “social media”.

Basic Common Crawl Processing

Posted on: Sun 28 October 2012

Pavel Repin copiously documents his initial foray into processing the Common Crawl data set:

At my company, we are building infrastructure that enables us to perform computations involving large bodies of text data.

To get familiar with the tech involved, I started with a simple experiment: using Common Crawl metadata corpus, count crawled URLs grouped by top level domain (TLD).

…

It’s not a very exciting query. To be blunt, it’s a pretty boring one. But that’s the point: this is a new ground for me, so I start simple, and capture what I’ve learned.

This gist is a fully-fledged Git repo with all the code necessary to run this query, so if you want to play with the code yourself, go ahead clone this thing.

Via Pete Warden

RESTful or Restless?

Posted on: Sat 27 October 2012

In my REST API expeditions at work, I’ve been using Flask-Restless. Now, via Python Weekly, I find out about Flask-RESTful. Normally I’d just scan and move on, but RESTful is from folks at Twilio and may have a bit more polish. To wit:

While Flask provides easy access to request data (i.e. querysting or POST form encoded data), it’s still a pain to validate form data. Flask-RESTful has built-in support for request data validation using a library similar to argparse.

The only hitch I see is no examples of connecting with ORM based models, admittedly after only 10 minutes with the docs. Restless actually handles this use case pretty well.

Alternative approaches are always good to know about.

Also have to say thumbs up on Python Weekly. Once a week to my Inbox, an easy read, at least one good link.