home ¦ Archives ¦ Atom ¦ RSS > Category: Uncategorized

Color Me Intrigued

PostgreSQL Logo PostgreSQL recently got some JSON object support baked in. It’s just a beginning though, with not much querying into JSON objects.

Jerry Sievert is attempting to replicate the features of MongoDB in Postgres. His first post just outlines how to implement the basic document insert and lookup features. If he can pull off the document query API of MongoDB, or even a reasonable subset, then things get interesting.

Still reserving judgement.


That’s No Toy

That’s no moon. It’s a space station.

Obi-Wan Kenobi

Black iPad Earlier I called my 2012 birthday present a toy. Well after a week and a day I can calmly declare that the 3rd Generation iPad is definitely not a toy.

For me at least, it doesn’t feel like a desktop replacement, but there are definitely some great features:

  • There’s enough screen real estate that Web apps are comfortable, although you may have to do a bit of zooming to make buttons big enough.

  • Mr. Reader, an iPad app for Google Reader, is delightful. Reasonably priced, a bit more functional, and definitely fresher, Mr. Reader is kicking NetNewsWire to the curb on the iPad.

  • Reading books, either through Amazon’s Kindle App or Apple’s iBooks, is gorgeous. Perusing PDFs is enjoyable, although I have to give iBooks bonus points for preserving URLs, whereas Kindle doesn’t.

  • Speaking of Kindle, the “Send to Kindle” app is very handy.

  • The device is a great size for watching videos.

  • Bonus I have the Verizon LTE edition, which allows you to use the iPad as a mobile hotspot for no extra charge.

I’m definitely starting to love this thing although DJay turned out to be a bit of a disappointment, largely owing to the fact that you need monitor speakers and a cable to handle pre-cueing. Sort of kills the “mobile” in mobile dj. And authoring posts will probably stay on the desktop, at least until I get to a higher level of performance with the keyboard. The iPad hasn’t found a creative niche for me, but it’s currently a great media consumption device. Of course that’s only after one week, plenty of room to grow.


Tab Colorization

Many of my favorite tools support multiple tabs, including iTerm2. On Firefox, there’s a plug-in, Colorful Tabs, that automatically colorizes the tabs making for nice visual distinction. Not sure it’s that big a deal cognitively, but at least it’s pleasant.

Mikko Ohtamaa has put together some Python code that implements tab colorization on ANSI-escape compatible terminal applications such as iTerm2:

OSX’s iTerm 2, and maybe some other terminal applications, support ANSI control sequence extensions which allow shell to set the color of the terminal tab.

Below is a Python script which

  • Randomizes a color based on the server host name. The same hostname always results to the same color.
  • The color is randomized in HSL color space, so that only the hue component varies and saturation and lightness are locked. This prevents the creation of ugly color combinations like black text on black tab background.

Very handy!


May Gone

What the hell?!? May’s already over? We’re into June, the sixth month of the year? Time is flying in 2012.


Prismatic GReading

As a long time ponderer of “The Daily Me” concept, prismatic née Woven has been very attractive. I’ve always been curious about how automated, personalized systems might play out, and I enjoyed Bradford Cross’ blogging when he had the time. But when they first launched needed access to one’s Twitter account to make recommendations. That was a non-starter for privacy reasons and the fact that I wasn’t really following anyone then. I do most of my infotrapping in Google Reader.

Fast forward a bit and now Prismatic can connect to your GReader account. So I’m on the bandwagon with yet another personalized news service. Maybe this one will work out better than the last such service did.


Clipboard Decloaks

If it wasn’t for the fact that Gary Flake is leading the team, I’d think Clipboard was a derivative, me-too product. His Going Public post makes it seem really interesting though. I’ll have to check it out, especially if they make an API available (mentioned by not discussed by Flake).


Force Multiplying

Once upon a time, I called MapReduce and Sawzall “major force multipliers”. At work, I’m learning the hard way about the Sawzall part of that combination. It’s great to have a scalable distributed programming model and massive data storage engines, but data querying and manipulation are the secret sauce where the magic happens. Trying to do querying type stuff at the Java API level is “teh suck”.

According to Wikipedia, Sawzall never hit the open source big time, but at least Pig, Hive, and Cascalog came to be. Pinin’ for a good open source, graph query language and runbtime baked into the Hadoop ecosystem though.


TIL resile

Today I Learned the word resile:

Verb: resile

  1. To start back; to recoil; to recede from a purpose.  

  2. To spring back; rebound; resume the original form or position, as an elastic body.

As in “Twitter Resiles From API-Driven Site”.

Via Nat Torkington.


@bigdata @ #hbasecon

Ben Lorica dropped in on HBaseCon and had a few cogent observations. Sounded like the tribal gathering of a smart, growing community around a core component of the Hadoop ecosystem:

HBase and HDFS: Past, Present, and Future: In a conference centered around a piece of technology, you need an overview of new and upcoming features. HBase and HDFS committer Todd Lipcon gave a good survey centered around reliability, availability and durability. I particularly appreciated the summary table at the end of his talk (I’ll post that table once Todd’s slides become available).


Rising

The Dark Knight Rises Poster Previously I had threatened a post on the prologue trailer for The Dark Knight Rises but never followed through. What with the promotional push now kicking up (cue irritating intersplices with NBA playoff footage), this is a good time to call shots on the themes of the movie.

My interpretation of Christopher Nolan’s Batman Begins is that of one man rising up against despair and corruption (and the arrogance of Ras Al-Ghul). The Dark Knight was one man fighting against a force of chaos. In both cases though, it was one man as savior.

In the prologue trailer, all about Bane the central villain of The Dark Knight Rises, it’s clear that Bane has built a movement. One of his henchman actually sacrifices himself for the cause at Bane’s behest. As he goes to this death, the henchman is clearly reveling in his martyrdom. Yeah Bane will be a physical and mental badass, but where he’ll have one-upped Batman is in motivating a disenfranchised underclass. And that’s something a loner cannot defeat no matter how powerful a symbol.

So picking up the tail end of the comic The Dark Knight Returns, Bruce Wayne will learn that enduring change is brought by building sustainable collective action. In so doing, after being broken by Bane, he will rise up from the darkness defeat his enemy, bring lasting peace to Gotham, and ride off into the night.

And the opening of The Dark Knight Rises will punk The Avengers. Can’t wait until July 20th.


5 Skills

I’ve been meaning to give Matthew Hurst a shout out as one of the original social media hackers. His posts over at Data Mining regarding d8taplex and other data hacking exploits definitely provide an interesting perspective.

As an apprentice mad data scientist, Hurst’s 5 Hidden Skills for Big Data Scientis resonates. To wit:

3. Invest in Interactive Analytics, not Reporting. When you construct reports about your data products, you are answering a fixed set of questions. This is useful for monitoring, but it doesn’t provide a way to get at the unknown unknowns. It is only through interactions with data (often called slicing and dicing) that pockets of interest (problems and opportunities) are discovered. Rich, interactive tools may be perceived as a low priority and never quite got to. Avoid this peril!


Intro to Data Science

Link parkin’: CS194-16: Introduction to Data Science. Openly available from my alma mater. Taught by Jeff Hammerbacher and Mike Franklin. Looks like a fairly complete collection of slides and videos are present including some top notch guest lectures.

Autodidacts unite!


The New Toy

Coming to you live from the new toy my wife got me for my birthday, a new, 3rd generation, iPad. Not quite sure what I’ll be primarily doing with it, but there is one must have …

Algoriddim’s DJay


An Engineering Blogroll

Link parkin’: Nice passel of software engineering blogs collected by Rafe Colburn. Will have to add ’em all to the aggregator.


Scalzi’s Stuff

John Scalzi is one of my favorite authors. Recently visiting the DC metro area, he had a mental slip and lost track of his computer bag.

I’m glad to say the DC area came through, returning the bag, and full contents, to him. Good on ya’ DMV, there’s gotta be a karma bonus in there somewhere.


UDub CS On Fire

UW CSE Logo Back in 1988 when I was applying to graduate schools, one of my undergrad advisers suggested I check out the University of Washington. He called their Computer Science and Engineering up and coming. Took his advice and sent in an application.

I got into 2 of the 3 places I applied, UDub and UC Berkeley, and visited both. I came away pretty impressed, especially because of some discussion time I had with Tom Anderson, who was “just” a doctoral student at the time. He gave me some great advice on my decision and career, some of which I managed to follow.

Cal was my eventual choice, primarily on rep and the fact that I had a number of friends on campus and in the area, and I’ll never regret that decision. From afar though, I’ve admired the UW CSE program as it’s closed on the MIT/Berkeley/CMU/Stanford summit.

I haven’t kept up with graduate rankings since I exited my former life, but Jeff Heer and Daniela Rosner joining the UW faculty, not to mention attracting Carlos Guestrin, indicates their further push to reach the top. A key has been building productive relationships with Microsoft, Amazon, Google, and other locally based tech juggernauts.

Occasionally (but not often) makes me wonder “what if?”


Warden and Kelcey

Thanks to stumbling onto the Common Crawl blog, I’ve run across sites for Pete Warden and Mat Kelcey. Both look like interesting data hacking venues which update at a reasonable frequency. Particularly like Warden’s Five Short Links series. Subscribed. May even have to follow them on Twitter.


Twitter Platform Objects

Twitter Bird Small Link parkin’: A field guide to Twitter Platform objects

Like any ecosystem, the Twitter platform has a variety of flora and fauna. Use this field guide to better understand the most frequently observed wild objects.

Don’t know how long this has been in place, but now I don’t have to guess about various details of the Twitter API. At least as long as the docs stay in synch with the code.


How Web Maps Work

How Web Maps Work: Does what it says on the tin. Nice overview.

Via Nelson Minar’s linkblog


Drogba

Chelsea FC Logo I didn’t give Chelsea much of a chance to beat Bayern Munich in the UEFA Champions League Final, but they somehow managed to pull off the victory. Oddly they’ve captured a funky double also winning the FA Cup. Meanwhile, 6th place is the best The Blues could do in the Premiership.

Didier Drogba figured large, scoring an 88th minute goal to keep Chelsea alive, giving up a penalty kick that was eventually saved by Cech, and hitting the final and clinching PK. That’s a long way from the horrible injury Drogba suffered back in August.


Pandas Pre-Print

Wes McKinney’s Python for Data Analysis is in Early Release from O’Reilly. If I could carve out an extended period of time to get some initial experience with pandas I’d grab it. Might have to do so anyway.


Large-Scale In-Memory

Amund Tveit attended Accel’s Big Data Conference and came away with some interesting takeaways. This spurred him to do a mental exercise on what it would take to store a year’s worth of Twitter’s tweets in RAM:

Keeping 1 year worth of tweets (including metadata) and (a crude) index of them in-memory is costly, but not too bad. I.e. 1.36 Million USD to keep 1 years worth of tweets (124 billion tweets) for 1 year in an (distributed) in-memory hashtable (or the same amount of tweets stored in the same hashtable for one day costs approximately 3732 USD).

Q: So, is it time to reconsider using hard drives and SSDs and consider going for RAM instead

A: yes, at least consider it and combine with Hadoop. …

Obviously $1.36 million dollars is a heap ‘o kale, but there are probably businesses with that scale of data and a competitive need for that speed of processing. It‘s extreme now, but 15 years ago buying terabytes off the Costco shelf was unimaginable to most people.

Other nuggets that struck me between the two posts include the fact that Twitter sees 340 million tweets per day and that realtime processing is a hot topic. I personally had the feeling that “realtime at scale” is the new frontier but this is a shred of confirming evidence.

Don’t know about that “combine with Hadoop” comment though. The more I find out about Hadoop the less impressed I am.


5 Billion Pages

Ever since I stopped my personal Twitter data collection project, I’ve been mentally casting about for a new dataset to build my Mad Data Skillz ™ (Boyeeeee!). Obviously, just restarting the tweet inflow is an option, but something involving more scale with less work would be nice.

Enter Common Crawl, a non-profit making a large — 5 Billion Web page — crawl publicly and freely available on Amazon EC2. How juicy! A big dataset conveniently located within the premier, openly available, utility computing infrastructure in the world. Definitely has potential to put the Skillz to the test. Common Crawl even has a convenient series of blog posts instructing one on how to process their page repository.


Hackers vs Action

Machine Learning for Hackers vs Machine Learning in Action to be precise. Two books. One topic. Different languages. John D. Cook compares and contrasts:

Both books are about the same size and many of the same topics. One difference between the two books is choice of programming language: ML for Hackers uses R for its examples, ML in Action uses Python.

I was somewhat interested in ML for Hackers since I’m familiar with and admire Drew Conway’s online writings. The use of Python better aligns ML in Action with my interests though.


Django API Frameworks

Daniel Greenfeld compares and contrasts Django toolkits for creating REST APIs. I’m interested in other perspectives on this topic as I’ve talked about tastypie before and actually put it in practice at work.

On site comments for the post and over at HackerNews are useful as well. In particular, I have to agree with a few folks that tastypie is pretty good for fairly standard Django models, but gets a little tricky for non-ORM or search based resources. In particular, dealing with object dehydration and response URIs was a bit opaque. I may be hallucinating, but when I first got started with tastypie, this documentation node on the request/response cycle didn’t exist. Maybe it’ll clear up my confusion.

I’d still recommend tastypie, but for advanced uses prepare to spend some time digging into module source code and doing a lot of experimentation to get the right results. As you’d expect!


The 94th Minute

Gee, guess I called that one. If I didn’t know better I would have thought Manchester City executed their highwire snatching of defeat from the jaws of victory then victory from the jaws of defeat, just to explicitly taunt their Mancunian neighbors. Of course they might have killed a fan or two of their own in the process. A more fitting cherry on top of the Premiership season would have been harder to script, what with The Citizens going into stoppage time to score two goals and rescue their campaign from ignominy. Time had literally run out on the season before they came back from the dead yet again.

Unfortunately, I didn’t get to see the match live as I was out and about for Mother’s Day. Driving through DC and constantly checking the iPhone for updates is not particularly safe, I can confirm. I’m hoping ESPN goes Instant Classic with the broadcast or there’s an on-demand recording available.


Nova Makers

Link parkin’: the Nova Makers group is right around the corner in Reston, Virginia. Looks like they even have a hacker space, Nova Labs to support a wide variety of activities.

The NOVA Makers meetup is dedicated to creating and supporting a community of makers in Northern Virginia.


datavisualization.ch Selected Tools

Link parkin’: Datavisualization.ch selected tools

Datavisualization.ch Selected Tools is a collection of tools that we, the people behind Datavisualization.ch, work with on a daily basis and recommend warmly. This is not a list of everything out there, but instead a thoughtfully curated selection of our favourite tools that will make your life easier creating meaningful and beautiful data visualizations.

Hat tip Chris Diehl via Twitter.


Premier’s End

Premier League Logo So the end of the 2011-2012 Barclay’s Premier League campaign arrives on us tomorrow. There are pretty much three races. Man City vs Man U for the title. Arsenal, Tottenham, and Newcastle United for the third and fourth positions, with at least one making the next Champion’s League. And finally Bolton Wanderers and Queens Park Rangers are trying to avoid relegation.

I like how the Premiership schedules their last weekend. Everyone plays and every game starts at the same time. No team sits in their locker room, rooting for an outcome. If you need a result, all you can do is your part.

Still got it in my gut that something wacky is going to happen at the top. Maybe Rangers pull a draw against The Citizens or the Red Devils actually lose to clinch it for Man City anyway. It would only be fitting if all three races were still in doubt going into the second half of the matches.


Apps4VA

Apps4VA Logo Cool! The Commonwealth of Virginia is going to be running an apps competition later this year. Longitudinal data regarding education will be the source fuel. The competition window, about a month starting in early August, leaves enough time for a part time hacker to crank out something interesting, even if they’re not interested in launching a startup.


I Concur

With a bit of a break in the work storm last night, I tapped into two first round NBA Eastern Conference playoff series: ’76ers vs Bulls and Hawks vs Celtics. Both were elimination games so I was hoping for some high drama at the end.

As The Basketball Jones points out, both series ended in horrible thuds.

The Bulls, having been a smart, consistently hard working, high executing team wasted great defensive effort on bone-headedness. Sorry C. J. Watson, but that was just the wrong play.

The Hawks were just robbed. The officials messed up on both Joe Johnson’s drive and the in bounds play. Then the Hawks went all Hawks on us, acting stupid and choking on the line. They couldn’t even get off of a long range heave for three to try and tie it. Typical Hawks.

High drama indeed.

Ob Wiz. Please Lakers. Knock Javale McGee out of the playoffs. However, I do admit it might be fun to continue the playoffs with the Lakers out and the Clippers in.


PyCon 2013

PyCon 2013 is going to be right back where it was in 2012, Santa Clara, California. I’m assuming the Santa Clara Convention Center again. Yeah!!

Sign me up! And I promise this time not to get sick, deny any work requests, and be a more active participant.


Diggin’ On: The Symphony

Marley Marl House of Hits Cover On my iPhone, took one of my infrequent trips into the “Random Hip-Hop” playlist for some listening pleasure. Shuffle landed me on Marly Marl’s “The Symphony, Part 1”. The beat is legendary but jeez, Big Daddy Kane brings it as the last Cold Chillin’ rapper with arguably one of the greatest raps of all time.

… And battlin’ me is hazardous to your health
So put a quarter in your ass, cause ya played yourself
Like a game in the arcade. You need a far aid
I'm walkin’ the path that Allah made
I’ll attend and then begin to send a speech to reach and teach
So just say when
So I can let lyrics blast like a bullet
My mouth is the gun; on suckers I pull it
The trigger, ya figure, my pockets gettin’ bigger
Cause when it comes to money, yo, Grant's my nigga! ...

And words on the screen can’t even begin to do justice to Kane’s enunciation and delivery. Classic.

Ob moment of silence for MCA (Adam Yauch). After odes to Heavy D, Guru, and Malcolm Maclaren, I’m giving up obits in this space though.


Clarity

“Scarcity brings clarity.” Boy is that ringing true for me with a big crunch at work, a couple of holidays coming up, and my wife soon out of town for a week. When time is scarce you really start to prioritize.

I’ll sleep when I’m dead.


TileMill 0.9.1

I recently tapped into the MapBox blog and they announced TileMill 0.9.1:

We just released TileMill 0.9.1, which adds support for PostGIS 2.0, runs on the latest Node.js 0.6.17 release, and provides packages for the latest Ubuntu Long Term Support (LTS) distribution: 12.04 (Precise Pangolin). TileMill 0.9.1 is the culmination of a several month sprint on stability, with over 80 tickets closed. The full list of fixes and advances for this release can be found in the changelog. Here are a few highlights.

TileMill sounds so cool but I really have no idea what you do with it other than “make maps”. Ah, here we go:

TileMill is an application for making beautiful maps. Whether you’re a journalist, web designer, researcher, or seasoned cartographer, TileMill is the design studio you need to create compelling, interactive maps.

My only question is whether it also eases the effort to serve your maps for web clients? After one has made their maps can you just point a browser at an obvious server and go to town?

Seems like something to learn. Could be another personal project.

Shout out to MapBox as a DC area concern.


Why Postgres 2

PostgreSQL Logo Craig Kerstiens continues to catalog reasons to use PostgreSQL. Like the additions, although a quick and dirty test-drive of Multicorn failed miserably for me. Trying to build the sample application actually crashed the postgres db server, which is pretty tough to do. Probably some embedded Python and dynamic library badness, but still. Maybe I just need to go back and be a little careful about my build.


Panning Out

Well I guess that the next generation MacBook Pro announcement I was hoping for didn’t really pan out. Haven’t heard a peep out of Apple about anything MacBook related recently, even though the Intel Ivy Bridge announcement happened a few weeks ago. Although as MacWorld points out, the ultrabook version of the processor, slated for release later in the year, would be more appropriate for MacBooks. And Apple typically doesn’t pre-announce stuff so the timing would be in line. If they announce it, you can buy it.

Still waiting patiently


Footballin’

Chelsea FC LogoSo get this. Liverpool has already won the Carling Cup. Chelsea just beat Liverpool for the FA Cup. And The Blues could double through the Champions League finals, although I don’t give them much of a chance against Bayern Munich in Munich. Meanwhile, Manchester City is basically one game away, in which they are heavily favored, from beating out Manchester United for the Premier League title.

The weird thing is that Chelsea and Liverpool are definitely also-rans in the Premiership this campaign. Liverpool is in ninth place in the tables as I write this. If Chelsea doesn’t win the UEFA championship, they might not be in at all next year. Man City stunk it up in knockout play, European play, and were left for Premiership dead a month ago. One of their top players acts like a spoiled child and another took an extended golf vacation in the middle of the season.

I don’t quite know what to make of “underachievers” taking home so many trophies, but methinks they might be giving out a bit too much hardware in international football.


That Chicago House Groove

The premise of Michaelangelo Matos’ “How Chicago house got its groove back” might be a bit flawed, but I found it worth a read. Feels like Matos at least did quite a bit of interviewing and background research, including talking to folks like DJ Sneak, Derrick Carter, and Cajmere in depth. If accurate, it fills in some details of mid-90’s House music I wasn’t aware of.

The comments are somewhat illuminating as well, with Carter himself chiming in with some corrections and lamentations. Writing such a piece is always fraught, partially due to the obscurity of what’s trying to be covered making it hard to get the story right, partially because space limitations mean leaving out part of the story, and partially because there are always irate fans who know better.

Ob. disclosure. When Matos’ mentions Curtis A. Jones forsaking graduate school in Chemical Engineering, I was literally there with Cajmere at UC Berkeley. Part of a small cohort of black engineering students, we met at a College of Engineering function and started hitting the SF scene for parties. There’s brushes with greatness, but I can definitely say, “I knew him when…”.

Hat tip, @CajualRecords

P.S. As predicted, DJ Sneak’s, Fabric 62, didn’t do a whole lot for me. This is why I have a bit of a problem with the notion that Sneak somehow led a revival in Chicago House Music.


Rendering The World

MapBox Logo Interesting post by Young Hahn of MapBox on “Rendering The World”. The problem Hahn discusses is the rendering of map tiles at high zoom levels for the entire world. The obvious and straightforward way quickly becomes unscalable for the zoom levels MapBox wants to achieve due to exponential, recursive explosion.

Turns out the actual space of unique tiles, by content, is orders of magnitude smaller than the number of tiles needed a.k.a. there’s a high level of redundancy. For example, many tiles at any zoom level simply represent all blue patches of water. Capturing and exploiting this redundancy is the key to getting scalable performance.

This page had been sitting in my Chrome tabs for quite some time, but it was well worth the read once I got around to it.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.