home ¦ Archives ¦ Atom ¦ RSS > Category: Uncategorized

Exit GeoIQ

I’ve mentioned before that in my day job I tried to fire up some collaborations with GeoIQ (née FortiusOne). Never could get anything off the ground but enjoyed my interactions and visits to their Clarendon office space. Always admired their scrappiness from afar and the neogeographer community they built up around GeoCommons.

Yesterday GeoIQ announced their acquisition by Esri, the 900 pound gorilla of GIS systems. On the one hand, it’s a little sad to see the little local guys get gobbled up. On the other hand, I hope it created a reasonable exit for the folks I got to make a personal connection with. Probably wasn’t lifechanging but beats going bankrupt.

And I find it interesting that they’re going to establish a Research and Development center in the DC area. Between the rapid tech changes in massive data analytics, mobile development, and Web mapping, must be an interesting time for GIS folks. I sort of realized that DC was a bit of a geonerd center, but this is just another confirming datapoint. Makes complete sense what with the concentration of gov, mil, spook, sci, campaign, and NGO types in the DMV.

Good luck, Sean and crew!


PyCon 2013

Yeehaw! PyCon 2013 is officially on the docket. Back in sunny Santa Clara again, I’m mentally booking my return trip.

Without the distracting work related activities.

And dying from some virus.

I promise.


Installing Pandas

Recently I went through the process of installing pandas Mac OS X and had a similar experience to Grig Gheorghiu,

I tried to install the pandas Python library a while ago using easy_install/pip and I hit some roadblocks when it came to installing all the dependencies. So I tried it again, but this time I tried to install most of the required packages from source. Here are my notes, hopefully they’ll be useful to somebody out there.

It wasn’t a truly heinous effort, but a lot less clean than I expected. Like Grig, HDF5 and PyTables were the worst, being the only ones I couldn’t pip my way through. However, I already had gfortran installed.

I’m really looking forward to putting pandas to the test, but this exercise makes using something like the Enthought Python Edition really attractive.


The Human Division

Loved John Scalzi’s Old Man’s War trilogy so his upcoming project, The Human Division looks intriguing:

My next project from Tor is called The Human Division. It takes place in the “Old Man’s War” universe, after the events of The Last Colony and Zoe’s Tale. It is not, strictly speaking, a novel.


Serious Analysis

Data Analysis With Open Source Tools Cover

So for so good on the first two chapters of Phillip Janert’s Data Analysis with Open Source Tools. Actually better than good, it’s been great. A little statistics, a little graphics, a little math, and a little programming. All starting from an expectation that the reader is somewhat experienced and with a matching serious tone. Kernel density estimates were actually new to me. A number of O’Reilly books start off a bit breezy, but not this one. Well worth the money so far.


Streaming Spark

Cool! Scalable streamed data processing on top of Hadoop-like infrastructure, via Discretized Streams (PDF)

The key idea behind D-Streams is to treat a streaming computation as a series of deterministic batch computations on small time intervals. For example, we might place the data received each second into a new interval, and run a MapReduce operation on each interval to compute a count. Similarly, we can perform a running count over several intervals by adding the new counts from each interval to the old result. Two immediate advantages of the D-Stream model are that consistency is well-defined (each record is processed atomically with the interval in which it arrives), and that the processing model is easy to unify with batch systems. In addition, as we shall show, we can use similar recovery mechanisms to batch systems, albeit at a much smaller timescale, to mitigate failures more efficiently than existing streaming systems, i.e., recover data faster at a lower cost

That’s how the Cal CS Division rolls.

Via Ben Lorica


Return To Local?

As someone who’s pretty much hated retail shopping since he was old enough to be dragged to a mall by his mother, I’m probably a little too predisposed to Jeff Jordan’s analysis that e-commerce is killing physical retail outlets:

I believe we’re approaching a sea change in retail where physical retail is displaced by e-commerce in a multitude of categories. The argument at a high level:

  • Online retail is relentlessly taking share in many specialty retail categories, resulting in total dollars available to physical retailers stagnating or even declining. This is starting to put intense pressure on their top lines.
  • Physical retailers are very highly leveraged and often have narrow profit margins. Material declines in their top lines make them unprofitable and quickly bankrupt.
  • Online retail will benefit greatly from the elimination of their physical competition and their growth should accelerate.

Need to check out the contra-commentary, and the argument rests a little too much on the singularity of Amazon, but the premise feels sound to me.

If this pans out, my question is what happens to all the physical space and local talent that supports big box outlets and malls? Does Main Street make a comeback? What do you do with vast stretches of strip malls like Rockville Pike in Maryland? How about more open and green spaces as folks shift to even more knowledge work and personal services? More home-based businesses or co-working?

Living in the exurbs of Loudoun County, VA, I’m hard pressed to envision what the region will look like in a generation from a commerce perspective. Leesburg Premimum Outlets better look out.

Via Jenn Webb @ O’Reilly Radar


Hangouts Fandom

Recently in my feedflow I’ve noticed quite a few plaudits for Google+ Hangouts, e.g. Lucas Gonze, who I’ve followed for a quite a while, is a Hangouts fan. While I’ve enjoyed the television commercials, since I’m not a huge G+ user I haven’t really gotten into it. However, I’m intrigued that there might be something that shows a path to elimination of the execrable Microsoft LiveMeeting in the workplace.

The amount of expensive talent time wasted setting up collaboration with that hideous tool is stunning to me. Feels like Hangouts hits the sweet spot, while being Web native and thus cross platform, which is the other half of my gripe with LiveMeeting, being a Mac user.

Probably not in my career lifetime, but one can hope.


Python for Data Analysis

Python for Data Analysis Cover Got suckered by one of those O’Reilly 50% off daily deals on e-books and had to buy the Early Release of Python for Data Analysis. Looking forward to digging into some pandas on my iPad.

Couldn’t stop at just one though, and had to grab the 2nd edition of SQL and Relational Theory by C. J. Date along with Data Analysis with Open Source Tools. I was pleasantly surprised by the table of contents for the latter. A bit meatier than I anticipated.

If you’re reading this on July 3rd, 2012 you can still jump in on the deal. Clock’s ticking though. Midnight PDT is when it expires. Also, of note that it’s only on a particular 25 e-books.


eGenix PyRun

eGenix PyRun looks like it might be useful someday

Our new eGenix PyRun™ combines a Python interpreter with an almost complete Python standard library into a single easy-to-use executable, that does not require a system wide installation and is fully relocatable.

PyRun’s executable only needs 12MB, but still supports most Python application and scripts - and it can be further compressed to 3-4MB using gzexe or upx.


Tree Style Tabs

I have some serious tab proliferation in both Chrome and Firefox. I’ll have to check out the Tree Style Tab Firefox plug-in.

This provides tree-style tab bar, like a folder tree of Windows Explorer. New tabs opened from links (or etc.) are automatically attached to the current tab. If you often use many many tabs, it will help your web browsing because you can understand relations of tabs.

Via Matt Ryall


Twelve Factors

Link parkin’: The Twelve-Factor App, a manifesto and methodology for building modern, scalable web applications. We’re a long ways away from good ole’ CGI.

Via Rafe Colburn


Ernie Escapes

Wizards Logo 2012 The 2013 NBA Draft was yesterday, and the Washington Wizards did the straightforward thing of picking Florida’s Bradley Beal. As a nominal Wizards fan it was about the best that could be hoped for given Ernie Grunfeld’s ability to pick losers.

I’m not particularly optimistic though. Have a feeling Beal might turn out a bit like Kirk Hinrich. Hinrich was selected by the Bulls 2 spots after Dwayne Wade who the Bulls coveted. Hinrich so far has had a serviceable yet undistinguished career. Meanwhile there were a few guys picked (well) after Kirk that went on to be All-Stars or play on championship caliber teams.

The Wade role this year might be played by Michael Kidd-Gilchrist (MKG), unexpectedly taken by the Bobcats in the second spot. Media here in DC ragged on MKG, but he did way more then Beal in his one college season, becoming an integral part of a championship team. This preview of Kidd-Gilchrist at The Basketball Jones captures my sentiment:

Chad Ford of ESPN sums up MKG nicely, writing “Ask anyone who knows anything about basketball what label should adorn Michael Kidd-Gilchrist as he prepares for the 2012 NBA draft and they all, to a man, say the same thing. Winner.”

My guess is that if MKG was still available at the three spot, Grunfeld would still have gone with Beal. Then I’d be really salty. So he escapes my wrath this time.


pandas 0.8

The Wes McKinney led pandas project has just hit the 0.8 release point. I’ve been fiending for an excuse to use pandas and now an opportunity at work has popped up to do some timeseries-ish analysis. If for nothing else I’m looking forward to being able to easily generate sequences of timestamps:

New DatetimeIndex class supports both fixed frequency and irregular time series. Replaces now deprecated DateRange class


Crowning Champs

Now that the BCS has ended, with Big Time Football somehow looking slimier, I like what Chris Brown had to say about playoffs:

So what does a playoff give you, and why is it probably a better solution for crowning a National Champion? Let me say first that I think it would be a better system than the current BCS morass. But the advantage the playoff gives you is not anything metaphysically correct. It probably does not crown the best team. And it does not reward the best season (sorry Utah).

It merely gives you relative certitude. It’s not perfect — some clunker teams can be crowned, some historically great teams will get the relative shaft — but, before the season, during the season, and in the playoffs, everyone knows what it takes to be the champion: you must get into the playoffs, and you must win every game once you’re there. The Patriots couldn’t lobby for votes, they couldn’t say that they got jerked around, and they even couldn’t say that they didn’t get their chance. They played and they lost. They were probably better, they might only have had a bad day, but hey, you knew what you were getting into.

Emphasis mine. Everyone knows what’s going to be on the exam. Either you pass or you don’t and your answers are fully visible to everybody. No lyin’, no cryin’. Perfect? No. But it’s better than what we had.

My only suggestion is to go for broke and have a twelve team playoff. Six automatic spots for conference winners. Six at large bids. Top four get a bye. Could be done in a month. You could even have a four game New Year’s Day bowl fest. Everybody wins!!


csvkit

How have I gone so long without knowing about Python’s csvkit?

csvkit is a suite of utilities for converting to and working with CSV, the king of tabular file formats. …

csvkit is to tabular data what the standard Unix text processing suite (grep, sed, cut, sort) is to text. As such, csvkit adheres to the Unix philosophy.

csvkit usefully replaces the built-in Python csv module (pretty useful in and of itself) and also provides a really nice set of command line utility for creating, slicing, and dicing csv files.


A Book I’d Buy

The Scientific Practice of Large Scale Data Analytics

At work I’m seeing too many people getting a Heap ’O Data (TM), and then not being systematic about how they manage, process, and analyze those precious bits. Your’s truly is a culprit, but I’m trying to get better. Even a basic primer on how to document your data sets would be helpful.

I have to imagine the Business Intelligence and Data Warehouse guys must have some recorded literature along with the DevOps, Scientific Computing, and Quantitative Finance communities. Probably where I need to start diggin’. Really, this process can’t be as haphazard as I’m seeing on a daily basis.

Feels like a good opportunity for O’Reilly Media


MBP Retina Review Revue

Link parkin’: TidBITS has collected a number of reviews of the Retina Display MacBook Pro.


Nice Time Capsule

The film Phone Booth, starring Colin Farrell has been knocking about HBO. I remember inadvertently seeing it on an airplane flight and being surprised at how enjoyable it was. The film’s stood up pretty well over a decade. In addition to Farrell, it also stars Kiefer Sutherland, Forrest Whitaker, Radha Mitchell, Katie Holmes, a bunch of quality character actors, and a somewhat forgotten seamy side of New York City. Joel Schumacher directs for a taught, tight 81 minutes.

Most notable might be that Phone Booth straddled a time when phone booths weren’t quite dead, cell phones weren’t quite dominant, and Manhattan hadn’t been completely scrubbed clean. Yet even though it captures a moment in time, the overall themes, and tension, are actually quite timeless. A nice little gem of a movie, that people will be watching for years to come.


Hollywood Narcissism

Well, Walter Jon Williams The Fourth Wall wasn’t exactly what I was expecting. Our intrepid heroine Dagmar is replaced as primary “protagonist” by Sean Makin. I use protagonist loosely as Makin is a stupefyingly self-absorbed former child acting star. The narrative devolves into a murder mystery regarding the Hollywood-based transmedia production Dagmar is helming with Makin in the lead role.

I think Williams was trying to explore the intersection of mass global entertainment and technology, but there wasn’t enough tech for me. And Makin is not a particularly good person but we have to spend 90% of the book locked into his inner dialogue.

Might be time to retire Dagmar and her crew.


Browser Proliferation

Tweetbot Logo Boy there are a lot of Web browsers embedded in various applications on my iPad. There’s Mobile Safari of course, a straight up Web browser. Then there’s a nice one hiding in 1Password for iPad. Mr. Reader, my RSS aggregator has one along with the IMDb app. The Google Search app has a browser embedded as well as Tweetbot.

Just an observation. Not sure if it’s good or bad. On the one hand, they’re all relatively consistent. On the other, my Web browsing get’s sprawled all over the place.


An Air Review

Not just any MacBook Air review, but an in-depth Jacqui Cheng, ArsTechnica look at Apple’s latest slim and thin laptops. Bottom line, the container is pretty much the same but the engine is vastly improved.


Et Tu Twitter

So now Twitter has joined Facebook in bombarding my inbox with useless notices about people I already track. Frankly, my email has pretty much become a cesspool of solicitation from corporate entities that I’ve had a prior “relationship” with. I’m almost surprised when I receive a human generated message directly for me. It’s just about enough to make an old UNIXhead give up the darn medium.

Time to go on a filtering rampage.


Thanksgiving In June

Been a rough patch on the home front for the past few days, but things are looking up. I highly recommend staying awake when driving up I-95. The guard rail alarm is really unpleasant.

Gotta give thanks though for all the family and friends providing support and well wishes. And thanks that the accident wasn’t worse. Fingers crossed we’ll get over the hump.


A New Contender

Given this, admittedly limited, comparison between the 13” MacBook Pro and 13” MacBook Air, I may have to mix it into my thought process about a new laptop. More horsepower and more memory expansion is tempting to tradeoff for the 50% weight and better resolution. Then again, it feels a lot like settling for the middle of the road. But it’s a personal machine I’m not trying to make money with, so why not do something different and work through the challenges? Something to consider.


Father’s Day

If you’ve got kids and you’re positively impacting their lives, good in ’ya. I can verify it’s rewarding, but demanding work.

Can’t say as how I got to take it easy and relax, due to a number of issues, but wanted to acknowledge all the positive male role models out there.


Titan Graph Store

Link parkin’: Titan

Titan is a distributed graph database optimized for storing and processing large-scale graphs within a multi-machine cluster.

Even better, it comes with a reasonably designed query API.


Build From Source

There will come a time when I need to build NumPy, and other packages on top of it, from source. On Mac OS X, it’s those finicky compiler options that get you. Thankfully, Jeet Sukumaran has written them down for when I need to look them up.


Linden’s Links

Great pile of links from Greg Linden:

Talk about geniuses at Facebook ignores the big problem that no one — not Google, Yahoo, Microsoft, Facebook, or any of the newspapers — knows how to solve this problem of making advertising relevant, effective, and lucrative without immediate purchase intent, despite years of work by thousands of brilliant people ([1] [2])


Memo to Zuck

Facebook Logo Zuck, no matter how many times you tell me a friend has done posted a new photo or added a new friend on Facebook, I’m not coming back. And I already hit the opt out link once before, which clearly didn’t take, so I’m not falling for that again. So please, just leave it alone, and stop with the e-mails. I mean you’re averaging close to one a day.

Desperation is unbecoming.

Yours. The Management.


The eyeo Festival

Nelson Minar has some high praise for the eyeo Festival. Back when she was less well known, I had the pleasure of meeting Fernanda Viégas at a session of the Hawaii International Conference on Systems Science. Tough work if you can get it. Fernanda in the house is a sign of a top notch event. eyeo sounds like a confab I should make it a point to attend before it gets overrun.


Air vs Pro

So the announcement that I’d been pining for came today, with Apple unveiling new products in both the MacBook Air and MacBook Pro lines. The new hotness of the top of the line 15” MacBook Pro, with Retina display, is pretty much my dream replacement for Ye Olde White MacBook. Heck, it can even be bumped up to 16 Gb of RAM with 768 Gb of SSD, for a small fee.

The 13” MacBook Airs deserve serious consideration though. First, while the Pro has slimmed down, the Airs are still 50% lighter. These days I’m extremely desirous of being leaner and meaner. The Airs can have their RAM upgraded to 8 Gb. The kicker is a 13” MacBook Air kitted out with more RAM and 512 Gb of SSD ($500, yikes!!) is still about $800 less than the equivalent 15” MacBook Pro. Obviously you get more cores, more Ghz, and more screen real estate, but $800 is 80% of the way to a Cinema Display for the desktop.

Interestingly, Apple claims the same battery life for both lines, roughly 7 hours, despite what must be a significant difference due to the Pro’s advanced display.

So serious top of the line mobile horsepower vs smaller, lighter, capable and significantly cheaper. Tough decision. I’ll wait for the in-depth nerd reviews and hold on until the 4th Macaversary.


Kindle Backlog

A nice side effect of installing the Amazon Kindle app on my iPad is that I was reminded of a number of e-books that I had bought. There was a total of nine, seven of which I hadn’t even started, plus two barely begun. Good works including Ready Player One, The Restoration Game, and You Are Not So Smart. Now I don’t have to spend a bunch of bucks on summer reading material.

I also went back and revisited a bunch of book I had bought from O’Reilly and uploaded as PDFs. O’Reilly also supports Amazon’s .mobi format and they let you redownload if needed. Gotta say that while PDFs look really nice, .mobi books shine in a big way. Good on O’Reilly for quality support as well.


LDA Intro

LDA stands for Latent Dirichlet Allocation, a machine learning approach to extracting hidden structure from document collections. I’ve been touting and applying LDA at work for a while now and think I have a pretty good handle on it. However, it’s always good to review the understanding. Edwin Chen has a nice, approachable introduction to LDA with a worked out example as the highlight. Thanks Edwin!


Easy Peasy EMR

Holy smokes! I just ran my first ever Elastic MapReduce (EMR) job flow a few minutes ago. Surprisingly, it didn’t crash or fail to complete. Ran a bit slow, which had me thinking it was gonna bomb. But nope, my little hashtag extraction script, finished processing 100MB of data in about 7 minutes. Most of the time was spent shipping the 100MB up to the EMR Hadoop cluster once it got going

Key was the usage of Yelp!’s mrjob Python package. mrjob exploits Hadoop’s streaming mechanism to fit Python into the Java based processing pipeline. What that costs in performance is more than made up for in flexibility and accessibility. At least for this big data hacking noob.

And I’m waiting to check the charges, but those will probably be on the order of pennies. Gotta leave having your own personal on-demand, dirt cheap cluster.


TIL lxml.objectify

Today I Learned about Python’s lxml.objectify module, which makes navigating XML in-memory tree representations a lot easier. Shame on me, I’ve been using lxml forever.

Thanks to Mike Driscoll


Labeling Axes

Label Your Axes

Ha, ha, only serious, about one of my pet peeves.

Via fluff


0.5 Billion Rows

Typically I don’t find Hacker News discussion threads enlightening although I frequent the site. But this one on loading half a billion rows into MySQL was actually quite good. Mainly because there were a few links to other interesting technologies such as: Snowplow, Trecul, and crush-tools.

And the original link was useful as well even though I’m not a MySQL fan.


OK! OKC

OKC Thunder LogoWow! Someone gave the Oklahoma City Thunder a defensive transplant in their victory over the San Antonio Spurs last night. They needed to hit some clutch shots to close out the win, but there were some periods of the game where the Spurs looked inept offensively. And the Spurs were the league’s team scoring average leader!

A few folks have already, prematurely, anointed them as series victor, but it definitely feels like a changing of the guard in the Western Conference.


Further Intrigue

Link parkin’: Fredrik Håård highly recommends bpython as a Python REPL. Color me intriguied.

I’ve never quite cottoned to IPython, although the new HTML notebook is very sexy, because I’ve never gotten it to play well with emacs. Maybe bpython can be a better citizen.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.