home ¦ Archives ¦ Atom ¦ RSS

AtBat 2012

In an effort to cut way back on the Couch Potato action, I’m trying for more listening than watching. The iPad makes for a good music media player, but sports programming is the critical factor. I can cut back on the sports “news” and mindless chatter through self-discipline, but live games are my Kryptonite. At least when I’m listening to a live broadcast I can multitask on things like writing code. TV just pins me slug like on the couch.

So I’m giving MLB AtBat 12 a run (even as I post this!) as it allows you to listen to every Major League Baseball radio broadcast as a digital stream. Pair it with some Bluetooth speakers and you’ve got a modern day Rube Goldberg radio, complete with Retina Display!

So far the experience has been great, and the price right at $2.99 for my trial month. While, MLB seems to have their act together, we’ll see how things fare during football, football, basketball, and hockey seasons. My impression is that the NFL, NCAA, EPL, NBA, and, NHL aren’t quite at the same level of sophistication. But at least I can build up the the anti-TV habit.


Just Works

I had to take my wife to a doctor’s appointment today. As is typical, there was a lot of hurry up and wait. I had both my work laptop and iPad with me. The iPad is a Verizon LTE model, which means they throw in the mobile hotspot capability for no extra charge. Hadn’t used it yet so I figured I’d give it a whirl to see how painful the process could be, just for future reference.

Turned the hotspot on on the iPad. Connected the laptop to the appropriate network name. And it just worked! I was even pleasantly surprised by the responsiveness of the network. Didn’t feel cellular at all.

Good to know, especially when routing around ridiculous hotel rates for Wi-Fi.


Teach Data Science

Link parkin’: Teach Data Science

This is the companion site to the electronic textbook, Introduction to Data Science, by Jeffrey Stanton. This book provides non-technical readers with a gentle introduction to essential concepts and activities of data science. For more technical readers, the book provides explanations and code for a range of interesting applications using the open source R language for statistical computing and graphics.


Prismatic, +1, -1

I’m getting a lot out of Prismatic, yet another in a venerable line of personalized “Daily Me” news services. I’m a sucker for these things but the technology landscape is literally strewn with their corporate wreckage. I would like to love it, but there are some issues:

  • Using short links in e-mail sent from the site
  • Putting links to Python documentation in my stream
  • No help, or way to get to my profile while reading my stream
  • Too much space allocated to “Share this story with your friends…”
  • Occasionally crashes Mobile Safari on my iPad

The biggest downside is that Prismatic is Yet Another Place To Read News (YAPTRN), and right now it’s last on my list of stops. We’ll see how long it can survive there.

The big upside is that I typically run across at least a few items of interest whenever I have a Prismatic session. It’s much better than predecessors of its ilk. The good bits:

  • Clever trick to heavily use facial images, social outlinks, and embedded Tweets. Feels more human and engaging.
  • Although completely opaque, the big images nicely break up all the text.
  • Inertial scrolling on the iPad fits the Prismatic style well, or they’ve really optimized for the device.

Makin’ Maps

This is part one of a five-part series about our recent explorations making choropleth maps using PostGIS, TileMill, Mapnik and Google Maps.

Turns out to be total of six posts, but it’s still a handy, not too deep, dive into making maps. Forewarned, there’s a fair bit of command line tweakage and assumed familiarity with open source tech. (Python Inside!) Not for those used to a lot of desktop app handholding.

From The Chicago Tribune’s News Apps Team.


Leanin’

If I indeed upgrade my personal Apple machine, in celebration of my fourth Macaversary, I’m leaning towards a kitted-out 13” MacBook Air. You can do a lot of damage with an ultralight portable boasting a half terabyte of SSD. And it’s at a relatively affordable price.


Enduring Geotech

I was all set to get some deep insight from The Atlantic’s article “The Future of the Map Isn’t a Map at All—It’s Information”, but it turned out to be pretty shallow. Even the attendant video wasn’t much more than a promotional for some new Google tech.

But between the provocative title, and my noodling into GIS technologies, it got me thinking that geohacking is a great business for a tech oriented person, especially with today’s advances. Interface applications, Web and native? Check! Massive data processing? Check! Real time data processing? Mobile applications? Check! Relevance to problems that matter? Check! Open, gentle slope, avenues to learn and hack? Check!

Important enduring organizations and institutions, with big checkbooks, care about understanding and using geospatial data. And more technologies are making more of that data available, at lower cost, more regularly.

If I was still in my past life, I would urge every Computer Science student to take an intro Geospatial Information System course. Right up there with Compilers, Operating Systems, and Relational Database Systems. GIS techniques and issues are that enduring of a computational capability in milspeak.


cliff

In the past I’ve written my own Python command line processing module to emulate what I call command shell frameworks ala git, Mercurial, and Subversion. Sucked.

I tried the pyCLI module but it didn’t quite work for me.

After a few hitches, Doug Hellmann’s cliff module did the trick. Need a longer test drive, but so far it’s been highly useful. I don’t quite love the use of distribute hooks but I can live with it until I find a better solution. The baked in command REPL is a nice to have.

Using cliff has been a good way to paper over some fairly complex processing with a power user grade UI. Also quite easy to add new features with quick turnaround.


Common Crawl Contest

The Common Crawl folks put together a little video to better explain their purpose, aims, and goals. Very well executed and it also includes an announcement of their first hacking contest. Tempted to do some sideline hacking on that dataset just for the Big Data experience. Winning would just be serendipity.


Postgres.app

Link parkin’: Heroku’s nicely packaged for Mac OS X, Postgres.app

Postgres.app is the easiest way to get started with PostgreSQL on the Mac. Open the app, and you have a PostgreSQL server ready and awaiting new connections. Close the app, and the server shuts down.

Postgres.app will be distributed through the Mac App Store, with a separate build containing the latest PostgreSQL beta available for direct download from the website.

I love the fact that PostGIS 2.0 is baked in. But I’m also a little wary of how well it supports the building and installation of extensions. I needz my plpythonu.


The Blatche Era

Andray Blatche’s stint as a Washington Wizard has ended. The Basketball Jones, alomg with some mildly amusing commenters, makes appropriate fun of the occasion:

As you can see, the Andray Blatche market is pretty dried up. There are even rumors that the Bucks’ original bid would have included an additional three pounds of mozzarella string cheese, but they pulled that because they didn’t think he was worth it. Tough break, but I’m sure the $23 million will cheer him up.

Looking back I’m less disappointed with Blatche then the last remaining piece of failure, Ernie Grunfeld. Andray was just a 29th pick, straight out of high school, project that hasn’t panned out. It was Grunfeld who made the decision to ridiculously overpay him.

And let’s be clear, I was pretty disappointed with Andray.


Udvar-Hazy Rises

The Dark Knight Rises opens this week, including a run at the Udvar-Hazy Center. Longtime followers will know that this makes me really happy.

Real IMAX FTW.


Common Crawl 2012

Mmmmmmm, fresh, hot data! With instructions to boot:

I am very happy to announce that Common Crawl has released 2012 crawl data as well as a number of significant enhancements to our example library and help pages.

Along with this release, we’ve published an Amazon Machine Image (AMI) to help both new and experienced users get up and running quickly. The AMI includes a copy of our Common Crawl User Library, our Common Crawl Example Library, and launch scripts to show users how to analyze the Common Crawl corpus using either a local Hadoop cluster or Amazon Elastic MapReduce.


Social Networks Unprofitable?

Good thought experiment by Derek Powazek: “What if Social Networks Just Aren’t Profitable?”

Here’s the short version: Every community-based site in the history of the web has essentially been a stab at creating a social network. Most of them fail as businesses, with the rare exception of small, lucky communities that become self-sufficient but not exactly prosperous. What if that’s just the way it is?

I’d say there’s a lot of evidence for the thesis and not much agin’ it. There’s also “profitable” and “PROFITABLE”. Maybe the profits in social networks just don’t scale to the level of publicly traded corporations. Doesn’t mean a nice living can’t be made at the Main Street or regional level.


Goodbye and Good Riddance

Friday, July 13th, glad to see you go. Seems like every single complex system (natural or man made) decided to spend the day frustrating me. From organizational BS, to proposal rejections, to IT failures (special honors to MS Outlook 2007), to traffic, to kids just doing what they do, frustration abounded.

Well your worst has been done and no catastrophes occurred. Sayanora!


s3tools

Link parkin’: s3tools

S3cmd is a command line tool for uploading, retrieving and managing data in Amazon S3. It is best suited for power users who don’t fear command line. It is also ideal for scripts, automated backups triggered from cron, etc.

Just what the doctor ordered for the continuous Tweet collectin’, S3 storin’, time constrained home hacker. On Ubuntu, cloning the git repo, running a config, and accessing my S3 bucket works right out the box.


eero

As someone who wrote a dissertation on programming language design, it’s rare when I see an announcement for a new language and go “Wow! That looks cool!” I just did that for the first time in a long time when I read about eero

Eero is a fully binary- and header-compatible dialect of Objective-C, implemented with a modified version of the Apple-sponsored LLVM/clang open-source compiler. It features a streamlined syntax, Python-like indentation, and other features that improve readability and code safety. It is inspired by languages such as Smalltalk, Python, and Ruby.

Eero” is pronounced [ˈe-rō]‚ and is similar to the English word “aero”.

After a surface read, I actually thought programming for the Mac OS X and/or iOS system APIs might be fun.

Still a long slog to success and popularity, but a kid’s gotta start somewhere.


Exit GeoIQ

I’ve mentioned before that in my day job I tried to fire up some collaborations with GeoIQ (née FortiusOne). Never could get anything off the ground but enjoyed my interactions and visits to their Clarendon office space. Always admired their scrappiness from afar and the neogeographer community they built up around GeoCommons.

Yesterday GeoIQ announced their acquisition by Esri, the 900 pound gorilla of GIS systems. On the one hand, it’s a little sad to see the little local guys get gobbled up. On the other hand, I hope it created a reasonable exit for the folks I got to make a personal connection with. Probably wasn’t lifechanging but beats going bankrupt.

And I find it interesting that they’re going to establish a Research and Development center in the DC area. Between the rapid tech changes in massive data analytics, mobile development, and Web mapping, must be an interesting time for GIS folks. I sort of realized that DC was a bit of a geonerd center, but this is just another confirming datapoint. Makes complete sense what with the concentration of gov, mil, spook, sci, campaign, and NGO types in the DMV.

Good luck, Sean and crew!


PyCon 2013

Yeehaw! PyCon 2013 is officially on the docket. Back in sunny Santa Clara again, I’m mentally booking my return trip.

Without the distracting work related activities.

And dying from some virus.

I promise.


Installing Pandas

Recently I went through the process of installing pandas Mac OS X and had a similar experience to Grig Gheorghiu,

I tried to install the pandas Python library a while ago using easy_install/pip and I hit some roadblocks when it came to installing all the dependencies. So I tried it again, but this time I tried to install most of the required packages from source. Here are my notes, hopefully they’ll be useful to somebody out there.

It wasn’t a truly heinous effort, but a lot less clean than I expected. Like Grig, HDF5 and PyTables were the worst, being the only ones I couldn’t pip my way through. However, I already had gfortran installed.

I’m really looking forward to putting pandas to the test, but this exercise makes using something like the Enthought Python Edition really attractive.


The Human Division

Loved John Scalzi’s Old Man’s War trilogy so his upcoming project, The Human Division looks intriguing:

My next project from Tor is called The Human Division. It takes place in the “Old Man’s War” universe, after the events of The Last Colony and Zoe’s Tale. It is not, strictly speaking, a novel.


Serious Analysis

Data Analysis With Open Source Tools Cover

So for so good on the first two chapters of Phillip Janert’s Data Analysis with Open Source Tools. Actually better than good, it’s been great. A little statistics, a little graphics, a little math, and a little programming. All starting from an expectation that the reader is somewhat experienced and with a matching serious tone. Kernel density estimates were actually new to me. A number of O’Reilly books start off a bit breezy, but not this one. Well worth the money so far.


Streaming Spark

Cool! Scalable streamed data processing on top of Hadoop-like infrastructure, via Discretized Streams (PDF)

The key idea behind D-Streams is to treat a streaming computation as a series of deterministic batch computations on small time intervals. For example, we might place the data received each second into a new interval, and run a MapReduce operation on each interval to compute a count. Similarly, we can perform a running count over several intervals by adding the new counts from each interval to the old result. Two immediate advantages of the D-Stream model are that consistency is well-defined (each record is processed atomically with the interval in which it arrives), and that the processing model is easy to unify with batch systems. In addition, as we shall show, we can use similar recovery mechanisms to batch systems, albeit at a much smaller timescale, to mitigate failures more efficiently than existing streaming systems, i.e., recover data faster at a lower cost

That’s how the Cal CS Division rolls.

Via Ben Lorica


Return To Local?

As someone who’s pretty much hated retail shopping since he was old enough to be dragged to a mall by his mother, I’m probably a little too predisposed to Jeff Jordan’s analysis that e-commerce is killing physical retail outlets:

I believe we’re approaching a sea change in retail where physical retail is displaced by e-commerce in a multitude of categories. The argument at a high level:

  • Online retail is relentlessly taking share in many specialty retail categories, resulting in total dollars available to physical retailers stagnating or even declining. This is starting to put intense pressure on their top lines.
  • Physical retailers are very highly leveraged and often have narrow profit margins. Material declines in their top lines make them unprofitable and quickly bankrupt.
  • Online retail will benefit greatly from the elimination of their physical competition and their growth should accelerate.

Need to check out the contra-commentary, and the argument rests a little too much on the singularity of Amazon, but the premise feels sound to me.

If this pans out, my question is what happens to all the physical space and local talent that supports big box outlets and malls? Does Main Street make a comeback? What do you do with vast stretches of strip malls like Rockville Pike in Maryland? How about more open and green spaces as folks shift to even more knowledge work and personal services? More home-based businesses or co-working?

Living in the exurbs of Loudoun County, VA, I’m hard pressed to envision what the region will look like in a generation from a commerce perspective. Leesburg Premimum Outlets better look out.

Via Jenn Webb @ O’Reilly Radar


Hangouts Fandom

Recently in my feedflow I’ve noticed quite a few plaudits for Google+ Hangouts, e.g. Lucas Gonze, who I’ve followed for a quite a while, is a Hangouts fan. While I’ve enjoyed the television commercials, since I’m not a huge G+ user I haven’t really gotten into it. However, I’m intrigued that there might be something that shows a path to elimination of the execrable Microsoft LiveMeeting in the workplace.

The amount of expensive talent time wasted setting up collaboration with that hideous tool is stunning to me. Feels like Hangouts hits the sweet spot, while being Web native and thus cross platform, which is the other half of my gripe with LiveMeeting, being a Mac user.

Probably not in my career lifetime, but one can hope.


Python for Data Analysis

Python for Data Analysis Cover Got suckered by one of those O’Reilly 50% off daily deals on e-books and had to buy the Early Release of Python for Data Analysis. Looking forward to digging into some pandas on my iPad.

Couldn’t stop at just one though, and had to grab the 2nd edition of SQL and Relational Theory by C. J. Date along with Data Analysis with Open Source Tools. I was pleasantly surprised by the table of contents for the latter. A bit meatier than I anticipated.

If you’re reading this on July 3rd, 2012 you can still jump in on the deal. Clock’s ticking though. Midnight PDT is when it expires. Also, of note that it’s only on a particular 25 e-books.


eGenix PyRun

eGenix PyRun looks like it might be useful someday

Our new eGenix PyRun™ combines a Python interpreter with an almost complete Python standard library into a single easy-to-use executable, that does not require a system wide installation and is fully relocatable.

PyRun’s executable only needs 12MB, but still supports most Python application and scripts - and it can be further compressed to 3-4MB using gzexe or upx.


Tree Style Tabs

I have some serious tab proliferation in both Chrome and Firefox. I’ll have to check out the Tree Style Tab Firefox plug-in.

This provides tree-style tab bar, like a folder tree of Windows Explorer. New tabs opened from links (or etc.) are automatically attached to the current tab. If you often use many many tabs, it will help your web browsing because you can understand relations of tabs.

Via Matt Ryall


Twelve Factors

Link parkin’: The Twelve-Factor App, a manifesto and methodology for building modern, scalable web applications. We’re a long ways away from good ole’ CGI.

Via Rafe Colburn


Ernie Escapes

Wizards Logo 2012 The 2013 NBA Draft was yesterday, and the Washington Wizards did the straightforward thing of picking Florida’s Bradley Beal. As a nominal Wizards fan it was about the best that could be hoped for given Ernie Grunfeld’s ability to pick losers.

I’m not particularly optimistic though. Have a feeling Beal might turn out a bit like Kirk Hinrich. Hinrich was selected by the Bulls 2 spots after Dwayne Wade who the Bulls coveted. Hinrich so far has had a serviceable yet undistinguished career. Meanwhile there were a few guys picked (well) after Kirk that went on to be All-Stars or play on championship caliber teams.

The Wade role this year might be played by Michael Kidd-Gilchrist (MKG), unexpectedly taken by the Bobcats in the second spot. Media here in DC ragged on MKG, but he did way more then Beal in his one college season, becoming an integral part of a championship team. This preview of Kidd-Gilchrist at The Basketball Jones captures my sentiment:

Chad Ford of ESPN sums up MKG nicely, writing “Ask anyone who knows anything about basketball what label should adorn Michael Kidd-Gilchrist as he prepares for the 2012 NBA draft and they all, to a man, say the same thing. Winner.”

My guess is that if MKG was still available at the three spot, Grunfeld would still have gone with Beal. Then I’d be really salty. So he escapes my wrath this time.


pandas 0.8

The Wes McKinney led pandas project has just hit the 0.8 release point. I’ve been fiending for an excuse to use pandas and now an opportunity at work has popped up to do some timeseries-ish analysis. If for nothing else I’m looking forward to being able to easily generate sequences of timestamps:

New DatetimeIndex class supports both fixed frequency and irregular time series. Replaces now deprecated DateRange class


Crowning Champs

Now that the BCS has ended, with Big Time Football somehow looking slimier, I like what Chris Brown had to say about playoffs:

So what does a playoff give you, and why is it probably a better solution for crowning a National Champion? Let me say first that I think it would be a better system than the current BCS morass. But the advantage the playoff gives you is not anything metaphysically correct. It probably does not crown the best team. And it does not reward the best season (sorry Utah).

It merely gives you relative certitude. It’s not perfect — some clunker teams can be crowned, some historically great teams will get the relative shaft — but, before the season, during the season, and in the playoffs, everyone knows what it takes to be the champion: you must get into the playoffs, and you must win every game once you’re there. The Patriots couldn’t lobby for votes, they couldn’t say that they got jerked around, and they even couldn’t say that they didn’t get their chance. They played and they lost. They were probably better, they might only have had a bad day, but hey, you knew what you were getting into.

Emphasis mine. Everyone knows what’s going to be on the exam. Either you pass or you don’t and your answers are fully visible to everybody. No lyin’, no cryin’. Perfect? No. But it’s better than what we had.

My only suggestion is to go for broke and have a twelve team playoff. Six automatic spots for conference winners. Six at large bids. Top four get a bye. Could be done in a month. You could even have a four game New Year’s Day bowl fest. Everybody wins!!


csvkit

How have I gone so long without knowing about Python’s csvkit?

csvkit is a suite of utilities for converting to and working with CSV, the king of tabular file formats. …

csvkit is to tabular data what the standard Unix text processing suite (grep, sed, cut, sort) is to text. As such, csvkit adheres to the Unix philosophy.

csvkit usefully replaces the built-in Python csv module (pretty useful in and of itself) and also provides a really nice set of command line utility for creating, slicing, and dicing csv files.


A Book I’d Buy

The Scientific Practice of Large Scale Data Analytics

At work I’m seeing too many people getting a Heap ’O Data (TM), and then not being systematic about how they manage, process, and analyze those precious bits. Your’s truly is a culprit, but I’m trying to get better. Even a basic primer on how to document your data sets would be helpful.

I have to imagine the Business Intelligence and Data Warehouse guys must have some recorded literature along with the DevOps, Scientific Computing, and Quantitative Finance communities. Probably where I need to start diggin’. Really, this process can’t be as haphazard as I’m seeing on a daily basis.

Feels like a good opportunity for O’Reilly Media


MBP Retina Review Revue

Link parkin’: TidBITS has collected a number of reviews of the Retina Display MacBook Pro.


Nice Time Capsule

The film Phone Booth, starring Colin Farrell has been knocking about HBO. I remember inadvertently seeing it on an airplane flight and being surprised at how enjoyable it was. The film’s stood up pretty well over a decade. In addition to Farrell, it also stars Kiefer Sutherland, Forrest Whitaker, Radha Mitchell, Katie Holmes, a bunch of quality character actors, and a somewhat forgotten seamy side of New York City. Joel Schumacher directs for a taught, tight 81 minutes.

Most notable might be that Phone Booth straddled a time when phone booths weren’t quite dead, cell phones weren’t quite dominant, and Manhattan hadn’t been completely scrubbed clean. Yet even though it captures a moment in time, the overall themes, and tension, are actually quite timeless. A nice little gem of a movie, that people will be watching for years to come.


Hollywood Narcissism

Well, Walter Jon Williams The Fourth Wall wasn’t exactly what I was expecting. Our intrepid heroine Dagmar is replaced as primary “protagonist” by Sean Makin. I use protagonist loosely as Makin is a stupefyingly self-absorbed former child acting star. The narrative devolves into a murder mystery regarding the Hollywood-based transmedia production Dagmar is helming with Makin in the lead role.

I think Williams was trying to explore the intersection of mass global entertainment and technology, but there wasn’t enough tech for me. And Makin is not a particularly good person but we have to spend 90% of the book locked into his inner dialogue.

Might be time to retire Dagmar and her crew.


Browser Proliferation

Tweetbot Logo Boy there are a lot of Web browsers embedded in various applications on my iPad. There’s Mobile Safari of course, a straight up Web browser. Then there’s a nice one hiding in 1Password for iPad. Mr. Reader, my RSS aggregator has one along with the IMDb app. The Google Search app has a browser embedded as well as Tweetbot.

Just an observation. Not sure if it’s good or bad. On the one hand, they’re all relatively consistent. On the other, my Web browsing get’s sprawled all over the place.


An Air Review

Not just any MacBook Air review, but an in-depth Jacqui Cheng, ArsTechnica look at Apple’s latest slim and thin laptops. Bottom line, the container is pretty much the same but the engine is vastly improved.


Et Tu Twitter

So now Twitter has joined Facebook in bombarding my inbox with useless notices about people I already track. Frankly, my email has pretty much become a cesspool of solicitation from corporate entities that I’ve had a prior “relationship” with. I’m almost surprised when I receive a human generated message directly for me. It’s just about enough to make an old UNIXhead give up the darn medium.

Time to go on a filtering rampage.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.