home ¦ Archives ¦ Atom ¦ RSS

The Flask Stack

At work, I’ve been giving Flask a whirl, with a focus on building a REST based API emitting JSON responses. A much different approach from Django, much more amenable to building an API to my mind. Much less magic, but much lighter weight. And at least for me, a bit easier to understand how to generate RESTful responses.

The biggest part of the learning curve has been SQLAlchemy. Between the SQL Core for building DB queries, to the object relational mapper (with two styles of declaring schemas no less), to engines, to sessions, there’s a lot going on. Using PostGIS is exacerbating the issue as 1) the geo part needs extensions like GeoAlchemy, and 2) I take advantage of Postgres’ range types, which aren’t baked into SQLAlchemy and are best used with a non-standard operator (@>). So imagine coming to this Swiss Army chainsaw of object relational mapping system and you’re immediately into figuring how to extend the framework. Fun!

Some helpful Flask extensions: Flask-Restless and Flask-SQLAlchemy


Wayne Enterprises Chronicles: Week 5

The Dark Knight Logo Mini TRAGEDY!! I went into the Monday night game trailing by 24 points and a skoosh. A tough get, but I had the overall #1 pick, Arian Foster going. 120 yards and 2 touchdowns was definitely doable.

152 on the ground. 1 touchdown. 1 reception for 16 yards. 23 fantasy points.

And I lose by a lousy half a point!!

Foster did his part. The actual killer was Stevan Ridley of the Patriots giving up a fourth quarter fumble that cost me 2 points.

Given that my quarterback, Rober Griffin III, got knocked out and only supplied 3 points, I probably shouldn’t have been in the game anyway. But Tony Gonazalez had another big day with 24 points. Meanwhile, my kicker and defense provided double digits as well.

Need to get more production out of my wide receivers, but dang did that one sting!


djay Unleashed?

If I read this right, a vexing problem I had with djay as virtual turntables has been solved:

There are actually two points here. You have the ability to support multichannel audio interfaces in djay, previously unavailable, and you also can route headphone output and main output separately – so even without a multichannel interface, you could use, say, both the headphone jack and an HDMI or USB out. That’s a big deal for DJs, because finally, you can pre-cue tracks, but also should mean apps with multichannel recording and playback, other tools with separate cueuing, and, heck, even iPad-based surround rigs if you want them

Will have to rekick the tires.


Another Coder’s Font

Source Code Pro. Currently making the rounds on the Intarwebs:

Source Code Pro is a set of OpenType fonts that have been designed to work well in user interface (UI) environments. In addition to a functional OpenType font, this open source project provides all of the source files that were used to build this OpenType font by using the AFDKO makeotf tool.

Via Mikko Ohmataa, said post also having useful commentary and comments.


OSM & PGSQL

Michal Migurski drops knowledge on how to get OpenStreetMap data into PostgreSQL/PostGIS:

At first glance, OSM data and Postgres (specifically PostGIS) seem like a natural, easy fit for one another: OSM is vector data, PostGIS stores vector data. OSM has usernames and dates-modified, PostGIS has columns for storing those things in tables. OSM is a worldwide dataset, PostGIS has fast spatial indexes to get to the part you want. When you get to OSM’s free-form tags, though, the row/column model of Postgres stops making sense and you start to reach for linking tables or advanced features like hstore.


pandas 0.9.0

Not much of a point-zero release fan, but I might make an exception for pandas.


Capability Challenge

Rafe Colburn has been on fire recently, but a recent post on capabilities really hit home for me:

China has deployed its own aircraft carrier. The vessel is a rehabilitated Ukrainian carrier that China purchased in 1998. Unfortunately, China does not have pilots who have practiced landing a plane on an aircraft carrier, nor do they have any planes that are capable of landing on an aircraft carrier.

What’s the point? That it’s a lot more difficult to develop a capability than it is to build something.

I work in an industry where “capability” is a bit of jargon and I work in an organization that likes to bandy the term about willy nilly. The industry is definitely cognizant of the meaning but there are quite a few people I work with who need to carry a printed version of this around.


Jesusphone 5

I ordered an iPhone 5 a couple of weeks ago, it arrived at my local AT&T this past Thursday, and I picked it up yesterday. I’ll admit it. I made a halfass attempt to wait in line on the first day. There was enough of a line to turn me away, even in the hinterlands of Northern Virginia. If I’d stuck it out, given I was going top shelf for a 64 Gb model, I might have actually gotten one to gloat about. But I had other stuff to do.

So here’s my pure, initial impulse reaction based on less than 36 hours of ownership.

I’d have been real $?!& salty if I’d stood in line a couple of hours for this thing. Physically longer and lighter? Big deal. Ear pods still don’t stay in my ears. Yet another cable to carry, woohoo! And I sort of liked the iPhone 4’s backglass, which gave it a nice heft. Maybe you can be too thin.

But. But! BUT!! … The combination of the new A6 processor and 4G networking makes the iPhone 5 fast as hell. Stuff I’d use to twiddle my thumbs waiting for on the 4? Done!

And the more you use it, the more you notice.

The biggest hurdle is that I transitioned from an early 2000’s poor excuse for a “smart phone” to the iPhone 4, a quantum leap. So anything less could feel like falling short.

But I think the 5 might be a keeper.


Cogs Command Line Toolkit

OEmbed Link rot on URL: https://twitter.com/pypi/statuses/254276578270400512

I doubt if I’ll switch from cliff, but it’s always good to be aware of your options. If I could just figure out how to use cliff without distribute or a setuptools install I’d be set.


The Cloud Unit of Computation

ZeroVM is to virtualization what SQLite is to DBMS.

Diggin’ in the feed cratez, I ran across a piece by Ben Lorica: “How ZeroVM changes analytics in the cloud”. As Lorica points out, ZeroVM is more akin to the Java Virtual Machine then virtualization containers. However, there’s an interesting implication:

Converged Storage in the Cloud The amount of time it takes to transfer data between two specialized clusters has led to storage systems with compute capabilities2. A recent example is storage vendor CleverSafe including Hadoop MapReduce into its dispersed storage network. Users of Hadoop MapReduce who have played with cloud computing are familiar with this issue: performing big data analysis in the cloud usually means having to first transfer data from storage systems (S3) to compute resources (EC2). This means that if lowering latency is an issue, bandwidth and data size limits what you can do. In contrast (assuming cloud services providers install it) ZeroVM lets you perform computations on the storage cluster!

Anyone who’s done any significant Big Data or parallel computation has run up against the issue of moving data versus moving computation. A computation container that’s cheap to deploy but isolates like an entire PC could be pretty handy. Throw in modern deployment tools like Chef and Puppet with Amazon Web Service style APIs and things get really interesting. Any chance AWS itself could get commoditized from below?


Tagging Tweets

We’re pleased to announce a new release of the CMU ARK Twitter Part-of-Speech Tagger, version 0.3.

I’ve used the ARK Tagger and can vouch highly for it.


Wayne Enterprises Chronicles: Week 4

The Dark Knight Logo Mini Failure. The Enterprises went down to defeat this past fantasy weekend. Run-DMC, Oakland’s Darren McFadden was the key blow, only scoring 4.3 fantasy points. Arian Foster had a solid 15.9 points, but that’s not enough for a RB1. So my running backs basically did me in.

Not to mention the opponent had Tom Brady storm back for 35+ points in addition to great days for Victor Cruz and Marshawn Lynch.

The real killer is that I lost by about 22 points, but I easily had that much extra left on my bunch. As soon as I taunt Cam Newton and sit him down, he goes (fantasy) off against Atlanta. Stevan Ridley joined in with Brady in abusing the Bills and would have been a nice replacement for McFadden.

But that’s fantasy football for you. Still in first, barely on point differential.


Hittin’ The Trifacta

More details from Gigaom.

Although Prof. Hellerstein was after my graduate school time, Go Bears!, and all that.


Postgres Performance Tips

PostgreSQL Logo

For many application developers their database is a black box. Data goes in, comes back out and in between there developers hope its a pretty short time span. Without becoming a DBA there’s a few pieces of data that most application developers can easily grok which will help them understand if their database is performing adequately. This post will provide some quick tips that allow you to determine whether your database performance is slowing down your app, and if so what you can do about it.

Craig Kerstiens provides some handy Postgres SQL that reveals how well the RDBMS is handling your queries. Optimization is left as an exercise for the reader.


Brendan O’Connor on Powerset

An interesting insider look, by Brendan O’Connor, at Powerset, a failed natural language search startup:

There’s a lot to say about Powerset, the short-lived natural language search company (2005-2008) where I worked after college. AI overhype, flying too close to the sun, the psychology of tech journalism and venture capitalism, etc. A year or two ago I wrote the following bit about Powerset’s technology in response to a question on Quora. I’m posting a revised version here.


Solr vs Elasticsearch

Haven’t had a chance to dig in to this comparison of elasticsearch and Solr , but I’m link parkin’ to keep an eye on the alternatives. Have to say I’m currently enjoying elasticsearch at work, but it’s still early in my experimentation.

A good Solr vs. ElasticSearch coverage is long overdue. We make good use of our own Search Analytics and pay attention to what people search for. Not surprisingly, lots of people are wondering when to choose Solr and when ElasticSearch.

As the Apache Lucene 4.0 release approaches and with it Solr 4.0 release as well, we thought it would be beneficial to take a deeper look and compare the two leading open source search engines built on top of Lucene – Apache Solr and ElasticSearch. Because the topic is very wide and can go deep, we are publishing our research as a series of blog posts starting with this post, which provides the general overview of the functionality provided by both search engines.


Useful That

I have it on good reputation that this Domingos guy knows a thing or two about machine learning.

Ha, ha! Only serious.


Ruh-roh Apple?

About This Mac Snap I upgraded my lil’ ole MacBook to Mac OS X 10.7.5 (not worthy of Mountain Lion support) and now the poor thing has been crashing. And it borked my alpha of Tweetbot for Mac (fixed). May just be age, but I’m hoping there’s a pending os update to fix some obscure issue.

I upgraded my iPhone 4 to iOS 6 and now I think I’m suffering the battery life issues many others have been seeing. Haven’t done a scientific investigation, and an uptick in audio streaming over 3G may be responsible, but still irritating.

Not really Apple’s fault but …, I went to the AT&T store a day after the iPhone 5 launch to put in my order. Okay, it’s going to take 3-4 weeks, but I’m a patient guy. However, it shouldn’t take me 30 minutes to get a hold of a sales rep and complete the pre-order. Yikes!

And then there’s that maps issue.

Stuff used to “just work”.


More itertools

OEmbed Link rot on URL: https://twitter.com/pypi/status/250302968228880384

Here I’ve collected several routines I’ve reached for but not found. Since they are deceptively tricky to get right, I’ve wrapped them up into a library. We’ve also included implementations of the recipes from the itertools documentation. Enjoy! Any additions are welcome; just file a pull request.


Manna

Keurig B70 I finally gave up on the “coffee shop” options here in greater Leesburg, VA. 3 Starbucks’, a combo cafe and Sushi bar, and retrofitted cobbler’s storefront. The latter’s somewhat charming, but doesn’t quite feed the need.

What’s a former illycaffè barista to do? At home, I could do a standard drip brewer, or a French press, or even a small espresso machine. But all of those are too inconvenient. When I’m at home and feel the urge, I don’t need top of the line java, just something to meet the craving.

So I bought a Keurig B70 Platinum K-Cup coffee brewer this past weekend. The Keurig scratches the itch when I get a few free moments at home to read, or hack, or just stargaze. Doing that in a public place with interesting foot traffic? That’ll have to wait until our next abode.


elasticsearch Cool?

I’ve had an eye on elasticsearch for a while now but didn’t really have a good reason to use the Lucene based search engine. Until today that is, when at work I was looking at the use cases I was applying MongoDB towards. I’m not a hater, but MongoDB jus wasn’t fitting the bill.

Time to give elasticsearch a shot. At least Luca Cava of orange11 thinks elasticsearch is cool:

First of all, what you’ll notice as soon as you start up is how easy elasticsearch is to use. You index your JSON documents, then you can make a query and retrieve them, with no configuration needed. One of the reasons is that it’s schema-less, which means it uses some nice defaults to index your data unless you specify your own mapping. For more precision it has an automatic type guessing mechanism which detects the type of the fields you are indexing, and it uses by default the Lucene StandardAnalyzer while indexing the string fields.

If you need something beyond the standard choices you can always define your own mapping, simply using the Put Mapping API. In fact every feature is exposed as a REST API.

Seems auspicious, but we’ll put the engine to the test.


Thirty For Twelve

Milestone. Twelve months straight of 30+ posts per month in this here venue. Sometimes it’s a grind, but routine is good. And I’ve stashed away a heck of a bunch of good information. Still need to get cranking on a project that I can narrate.


eGenix PyRun

eGenix PyRun looks like it might come in handy some day.

Our new eGenix PyRun™ combines a Python interpreter with an almost complete Python standard library into a single easy-to-use executable, that does not require a system wide installation and is fully relocatable.

eGenix PyRun’s executable only needs 11MB, but still supports most Python application and scripts - and it can be further compressed to 3-4MB using gzexe or upx.

Compared to a regular Python installation of typically 100MB on disk, this makes eGenix PyRun ideal for applications and scripts that need to be distributed to many target machines, client installations or customers.


Ask, Answer, Tell, [Suggest?]

Sean J. Taylor interned with the LinkedIn Data Science team and had some cogent observations on what the work can really boil down to:

Most people who describe data science are actually describing what it takes to get the job. (e.g. Take statistics and machine learning courses. Learn such-and-such languages/packages. Hack.). Or they describe how the job is growing in importance. I’m going to describe the actual practice of data science.

The Data Science Loop:

  • Ask a good question.

  • Answer the question while economizing on resources.

  • Communicate your results.

  • (Sometimes) Make recommendations to engineers or managers.

The whole thing is worth a read, but most importantly, the closing graf.


Wayne Enterprises Chronicles: Week 3

The Dark Knight Logo Mini Victory!, although Cam Newton had me scared on Thursday with a lousy eight points. That’s 3 and 0 for those counting at home. Leading the league at this moment, although as in all things fantasy, my luck could turn at any minute.

The big surprises this week were Darren McFadden with 18.5 points and Tony Gonzalez with 19.6 points. Great production out of RB2 and the TE position. McFadden scored more than Arian Foster (!) my, and the league’s, overall number one pick. Both my wide receivers were solid with double digit performances and even K Robbie Gould pitched in with 13 points.

I had a comfortable enough lead going into the Sunday night game, that I could even leave the Ravens DEF in against the New England offense. Luckily they didn’t go negative or I really would have been sweating. Serious consideration was given to benching the Ravens.

102 points without much production from a key position, QB, just shows the strength of this team. And Cam Newton might get benched for RGIII this weekend!


pyDAWG

Okay, tons of background knowledge is great, until you find yourself having to search 20Gb / 270+ million lines to find anything in that mountain of data. Full text indexing seems like the right choice, but again, who wants to deal with Solr/Lucene, ElastiSearch, or Sphinx just to get started?

Enter DAWG

This package provides DAWG-based dictionary-like read-only objects for Python (2.x and 3.x).

String data in a DAWG (Directed Acyclic Word Graph) may take 200x less memory than in a standard Python dict or list and the raw lookup speed is comparable. DAWG may be even faster than built-in dict for some operations. It also provides fast advanced methods like prefix search.

Based on dawgdic C++ library.

I’m going to give it a shot at work and see what happens. The big upside is fast prefix search (hopefully) over and above a key/value store’s fast key lookup. Only obvious downside I can see is constantly hearing Xzibit in my head when I read the docs for this module.


What A Trip!

OEmbed Link rot on URL: https://twitter.com/Dope_Den/status/249882120196079618

Two decades ago, Chicago-native Mark Farina began experimenting at his DJ gigs, dropping slower, deeper tracks along the lines of disco classics, acid-jazz, hip hop, and downtempo. In Chicago, which is primarily known as a house music town, his selections were deemed more appropriate for home listening than for the dance-floor. Despite this, Farina continued to develop his sound, which soon after became coined as Mushroom Jazz. A few mixtapes followed, but his experiment-turned-endeavor finally got off the ground in his new home, San Francisco.

Farina juggled between his two residencies of Chicago and San Francisco until he made a permanent move to SF in 1994. He saw opportunities in the city with its vibrant music scene, which featured a slew of promoters catering to fans across a broad spectrum of music styles including hip hop, jazz, house, and reggae to Wicked-style breaks and techno. A couple years before he made his move, he teamed up with Patty Ryan-Smith to throw his own Mushroom Jazz club night.

I’ve done some dumb things in my life. Not going to a Mushroom Jazz event, when I was essentially a very gradual student at Berkeley, has to rank pretty high on the list.


Diggin’ On Whoosh

I’m starting to observe that when dealing in data exploration, right after summary statistics, keyword style searching is high up on the TODO list. Until you really need them though, pulling out the big boys like Solr/Lucene or Sphinx are sort of a pain. When you’re in iterative exploration mode the tax of dealing with enterprise scalable software is substantial. YAGNI probably applies. However, if you’re of a Pythonic mind the Whoosh is a nice, lightweight starter toolkit.

Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python. Programmers can use it to easily add search functionality to their applications and websites. Every part of how Whoosh works can be extended or replaced to meet your needs exactly.

I’ve been incorporating Whoosh into some data analysis on a tiny data set and it’s been a blast. So much so I’ll soon try it out on a bigger, but not massive, pile of bits. A nice feature of a pure Pythonic search library is that you can stash arbitrary Python data structures in the index. This really increases the utility of dealing with search results as opposed to having to go to another store to retrieve more complex non-indexed objects.

Whoosh is also useful for embedding in Swiss Army command shells built using cliff.


python-omgeo

Link parkin’:

OEmbed Link rot on URL: https://twitter.com/pypi/statuses/248871216629313536


Background Knowledge

Diggin’ in the feed cratez, ran across From Words to Concepts and Back: Dictionaries for Linking Text, Entities and Ideas, announced by Google’s Valentin Spitkovsky and Peter Norvig:

Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning — from turning search queries into relevant results to suggesting targeted keywords for advertisers — is also Google’s core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas.

And the data is readily accessible with publication capturing the details.


The Data Kalashnikov

Word. Although I will say that today I managed to export two Excel worksheets to tab separated value, text files encoded in UTF-16 and then safely turn them into CSVs using Python’s csvkit. No children were harmed.


psycopg2 and copy_from

Yesterday I was developing some code to do bulk loading of data into PostgreSQL. The db’s COPY command is the best, if not exactly most inviting way to do this, modulo possible extensions, but those haven’t worked for me.

Unfortunately, COPY reads from files local to the server or over standard input. I was dreading having to jerry-rig something out of SSH when I revisited the psycopg2 module. Turns out psycopg2 has a copy_from method that does what you’d expect local or remote DB. Python FTW yet again.

Probably should have been using psycopg2 from the get go.


Storm Turns 1

Even though I don’t have an opportunity to use it, Storm feels like goodness.

Storm was open-sourced exactly one year ago today. It’s been an action-packed year for Storm, to say the least. Here’s some of the exciting stuff that’s happened over the past year:

And it looks like momentum is building.


Wayne Enterprises Chronicles: Week 2

The Dark Knight Logo Mini Victory! It’s great to wake up on Monday morning, know victory is in hand, and have an extra player to run up the score with. My 43 point margin victory is the highest in the league this year.

I take back any rude thoughts or comments I might have had about Arian Foster (23 points), Hakeem Nicks (30 points), and Cam Newton (25 points). Tony Gonzalez and the Ravens DEF chipped in with double digits as well.

Marques Colston and Darren McFadden? Not so much. Run-DMC is troubling because he’s manifesting his usual inconsistency, but he’s my number two running back. At least he’s not injured … yet. I may have to work up a trade or an inspired waiver wire pick. Colston probably won’t even be on my week three roster. I know he’s in a high powered offense, but he’s not delivering at the moment.

Last week every team plays before the bye weeks kick in. Then things get really fun. I feel good about my team, but I need to shore up second positions at RB and WR.


TIL @pypi

Today I Learned that PyPI has a Twitter account, @pypi

OEmbed Link rot on URL: https://twitter.com/pypi/statuses/247637227658674176

It’s a bit of a firehose, but worth it to pickup packages like fastcluster.


Gibson, Wired, io9ed

William Gibson has been out and about a bit, and giving some interviews. First with io9:

And a three-parter from Wired, 1, 2, 3


XLHost

Earlier, I complained about the cost of purchasing hosted PostGIS. An alternative? Run your own damned installation. If you’ve got a relatively small amount of data, 10s of Gbs, you can straightforwardly do this on a Virtual Private Server. 100s of Gbs approaching Tbs and things get really expensive. VPSes just weren’t designed for gobs of virtualized storage, except maybe for Amazon and even there you run into fun with EBS if you want durability.

There is hope with a dedicated server, which are generally for the pros. Still a bit expensive for us knowledgeable amateurs. But XLHost, out of Columbus, Ohio, seems to have a few discount products within reach for those somewhat serious about stashing a large pile of bits. I’m not going to claim it’s cheap, but an industrious hacker with some disposable income could make a go of it.


Silver, Signals, & Noise

From Nate Silver

An excerpt from my forthcoming book, “The Signal and the Noise,” was published this week in The New York Times Magazine. You can find an online version of the excerpt here.

The book takes a comprehensive look at prediction across 13 fields, ranging from sports betting to earthquake forecasting. Since 2009, I have been traveling the country to meet with experts and practitioners in each of these fields in an effort to uncover common bonds. The book asks an ambitious question: What makes predictions succeed or fail?

Pre-ordering tonight. I had a thought about an interesting sports related challenge on the way home today. There’s enough historical play-by-play data for Major League Baseball, and possibly the National Football League, and straightforward enough grading, that you could build predictive models and test them real-time. Would be interesting to see how quickly and accurately a system could predict a game’s next pitch or play and outcome.

Might be some financial opportunities there as well.

Has it already been 4 years since 538 stormed the world? Time flies.


Wayne Enterprises Chronicles: Week 1

The Dark Knight Logo Mini Victory! Week one brought a solid performance for Wayne Enterprises in my office fantasy football league. There was some nervousness when Arian Foster reported a knee injury at the tail of the week and was something of game time decision. Shades of CJ2K biting me in the rear two years ago. But Foster gave me a solid 20 points with a surprising 18 from Darren McFadden.

My starting receivers, Hakeem Nicks and Marques Colston were disappointing with less than 10 points each. Cam Newton came up well under his projected as well. In contrast, I had Tony Romo and Miles Austin had big games. Cam probably gets another week, but I’m sorely tempted to plug in Austin.

Whatever. My team put won handily, although I had to wait until late Monday night to really feel comfortable.

Yahoo! has these automatically generated game recaps that are actually pretty fun to read.


Retinaed

Apple Gear.jpg

Apropos today’s announcements.

Okay. I admit it. I have become a legitimate Apple fanboy.

The laptop I’m writing this post on? A 13” 2008 White Plastic MacBook.

The laptop I do my day job on? A kitted out 15” 2011 MacBook Pro provided at my request.

My wife’s 2010 Christmas gift? An iPad, 2nd generation, AT&T 3G.

The handheld I use to distract My Little Guy(™)? A fourth generation 64Gb iPod Touch.

The cellular phone where I take calls, read Twitter with, track my weight, and text on? A 2010 16Gb iPhone 4.

The tablet I carry around just about everywhere I go these days? An iPad, 3rd generation, 64Gb, Verizon LTE. Damn if it ain’t a sweet product.

I’ve seriously considered purchasing an Apple TV and an Airport Express.

Keep in mind that I’m a built my own computer, written my own compiler, written assembly code with some effort, ported X Windows to OS/2, used Linux on floppy disks, Scheme embedding sort of hacker. I’m not quite sure if this state of affairs should be distressing.

In any event, note the last two products of mine that I listed are Apple’s first Retina Display products. I have to say, using both of those continuously, then going back to say the White MacBook display for an extended period, really highlights the difference.

So much so that a Retina Display is a must have feature for the next round of devices I’m lusting after, the next iPhone and a power user’s MacBook Air.

Thus have I been Retinaed.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.