home ¦ Archives ¦ Atom ¦ RSS

1 Billion Tweets

Twitter Bird Small Once upon a time, I viewed capturing 1 million tweets as a challenge. Now that that’s in the rearview mirror, I’m pondering new achievements. This one is really audacious.

1 BILLION TWEETS! (curling lip ala Doctor Evil and chuckling diabolically)

That would actually be something outstanding for a single individual. Consider that to do it in a calendar year, you’d have to average just a little south of 2.75 *million tweets per day, 114K per hour, 31 per second. I’m not even sure that’s possible with Twitter’s streaming API. And then there’s the bandwidth and storage issues. Not to mention maintaining pretty high annual uptime to stay on target.

Now that would just be the capture. What the heck would you then do with all that data? Imagine the possibilities.

I think a reasonably well funded individual could pull it off with careful exploitation of Amazon Web Services, but it would definitely be non-trivial. I can dream can’t I!


pyCLI Redux

Previously, I had posted about the pyCLI package, mostly just stashing the link away. Recently I’ve had occasion to put the module to use for real and started digging into the docs.

Better. Than. Remembered.

The argument parsing really easy and convenient. I am now salivating about the logging and daemonizing features. pyCLI doesn’t do anything spectacular, but it makes a couple of good to do things, easy to do.


Spotlight Keywords

At work, I moved to a brand spanking new Macbook Pro. Yeehaw! But in the migration of my old data, Spotlight search over my Thunderbird e-mails no longer worked. This was a big hindrance as I archive e-mail by stashing it in Thunderbird local folders. Thunderbird search is getting better, but I’ve got LaunchBar muscle memory for shooting off Spotlight searches. In effect, I underwent partial institutional memory loss.

After an epic hunt across the Interwebs, I finally managed to find a solution to enable Thunderbird Spotlight indexing thanks to GetSatisfaction. Suffice it to say it involves gnarly file permission tweaking.

Then I got to thinking, wouldn’t it be great to limit Spotlight searches to just e-mail? Gotta be a way to do that right? Turns out Spotlight supports a number of keywords to match various file types. kind:email gets both Thunderbird and Outlook messages. Handy!


Enough Technology

I was diggin’ through the notes crates, looking for something good to post about, when I ran across this oldie but goodie from Paul Buchheit of GMail fame:

“I don’t believe that’s true though. There is an optimistic way of understanding my first point, and that’s my second point: Even if you aren’t the smartest person around, and your product is kind of ugly and broken, you can still be very successful, if you just build the right product. YouTube and MySpace are both fine examples of this.”

…“When Google acquired YouTube, many people inside the company were flabbergasted, “But they have no technology!?” They didn’t understand that you only need enough technology to make the product work.”

I’ve been trying to bring a more entrepreneurial mentality to work. Unfortunately, the tendency is towards trying to game the customer, rather than building “products” they want. Now we’re in science and technology applied research for the global security market, so product is somewhat ill-defined. But I know one thing, more slideware doesn’t cut it these days. Unfortunately, here’s the pervasive tendency within my org. 1) Think up some cool idea, 2) put it in PowerPoint, 3) shop it to program managers, 4) PROFIT (or not).

And I should really know what Buchheit is saying, given what I’ve seen recently, but it’s always a struggle. Enough technology, or data, or analysis, can make the product, our intellectual services, work.


I Heart Sphinx

It’s ornery and has sharp pointy teeth, but I’m coming to appreciate the Sphinx full text indexing and search engine. Might not have the greatest documentation or APIs but damn does it index like a bat out of hell.

I’ve personally seen it rip through approximately 4 Gb of data on a 5 year old server with only 8 Gb of RAM, said data on a suboptimal Linux ext3 filesystem, on top of an untuned kernel, and with no thought given to the IO and HD subsystems. Grand total of 21 minutes.

That is a nice capability to have.

I know Lucene with Solr on top is sort of the default open source choice for full text indexing, but if you’re in the market Sphinx is worth a tire kick.


Hadoop World 2011 Presentations

Link parkin’: A fairly comprehensive collection of slides and videos from the Hadoop World 2011 program presentations have been posted directly to the Hadoop World site. Also as a straightforward list of materials on Cloudera’s site, with registration required.

Hadoop World 2012 in the Fall is fairly attractive.

Hat tip: Jon Zaunich


Another Earlybird

Link parkin’: PDF Warning Earlybird: Real-Time Search at Twitter

Rapidly updated text indexing from folks who really have to care about it.


Lockout Upside

NBA Logo Small The NBA Lockout has resulted in a highly compressed schedule. This has the nice upside that between TNT, ESPN, and NBA-TV, practically every night there’s a game on the flat panel. Many of these are interesting match ups.

Of course the downside of the lockout and compressed schedule is that often these games wind up really sucking. Teams had no real training camp, have no real practice time, and are often playing tired. I’m not sure I’ve ever seen this many 20+ blowouts. And it’s only the first month or so of the season.

Speaking of which, at this very moment the Orlando Magic are beating the Boston Celtics by 21 points, 58 to 37, at the half. A few days ago the Magic only scored 56 points in an entire game against the Celtics.

Makes for good blog writing background noise at least.

Update. Don’t know if this confirms or disconfirms my assertion, but the Celtics came back from 27 points down to win by 8. Watching Orlando collapse was sort of painful. If not good basketball at least it was entertaining.


Seismic Data Science

I couldn’t extract a really good money quote, but found Josh Wills post on Seismic Data Science: Reflection Seismology and Hadoop well worth reading. First Wills delves into the squishy term “data science” and usefully adds some definition. Then he looks at how the company he’s with, Cloudera, built some interesting infrastructure to adapt Hadoop to the seismic data processing domain.

One observation Wills made is really on point. The original core Hadoop infrastructure is starting to look like basic plumbing. Meanwhile there’s a coming (ongoing?) explosion of domain specific programming models, tools, and applications being built on top of Hadoop as a platform. Sort of like how Lisp macros enable the proliferation of domain specific languages.

Except not quite as elegant. But I quibble.


One Down, Four to Go

Wizards Logo 2012 The easiest piece of the Wizards horrible Jenga puzzle got yanked today. Flip Saunders was relieved of his coaching duties and probably restored of his sanity. Godspeed kind sir. You deserved better than the dreck you were served.

The scary thought is that this move could buy Ernie Grunfeld more time to damage the team. As Jason Reid points out, he needs to go quickly. Especially because there are some player personnel moves that need to be quickly executed in the hopes that John Wall and others don’t permanently catch the losing bug.

Basketball Prospectus had the scariest headline I’ve seen today: “What if the Wizards Don’t Get Better?”.


Earlybirdin’

PyCon 2012 Logo I’m locked and loaded for PyCon 2012. Took advantage of the early bird rate to save a few shekels and I’m at least guaranteed a spot for the talks. Still need to sign up for a couple of tutorials, but I will be back in Da Bay for the first time in well over a year, probably starting late on March 6th.

I definitely want the promised cookie. If on the off hand there are any other Pythonistas reading this blog planning to attend, feel free to shoot b m d at crossjam dot net an e-mail ahead of the proceedings.


Diggin’ On: The Basketball Jones

The Basketball Jones Logo

Have to say recently I’ve quite been enjoying The Basketball Jones in general, and Trey Kerby in particular. Mostly because Kerby’s been entertainingly skewering the hometown team, a.k.a. the Washington Wizards. But the overall NBA coverage is a hoot as well.

Seriously, I actually laugh out loud on a daily basis thanks to TBJ!


Arsenal v ManU

Arsenal Logo While I haven’t been watching a lot of English Premier League matches recently, I have been keeping an eye on the tables. The usual suspects, Manchester City, Manchester United, Tottenham , Chelsea, and Arsenal, are at the top, with the Blues a bit disappointing, the Gunners surprisingly resilient, and the Spurs a welcome new face. After the 8-2 beatdown ManU gave Arsenal earlier in the year, and the acrimony heaped on Arsene Wenger, the Gunners are within striking distance of Chelsea, the fourth spot, and Champions League next year.

We’ll see how far they’ve really come though, what with a visit from Manchester United this afternoon. Add to that Tottenham traveling to Manchester City and it’s a pretty important day in the Premier League.

Good timing for a great match day with the NFL Conference Championships late in the day and no games between ranked teams in NCAA Men’s Basketball.

Update. Both matches were quite entertaining, although I found the Man City / Tottenham game much more compelling. Gotta give the Spurs credit for coming back from two goals down to level. Thought the Gunners were gonna cave, but they fought back too. Not a top move by Wenger though, bringing in Arshavin.


Social Futurism

No not futurism done socially, but futurism that focuses on social developments. Noted futurist Jamais Cascio discoursed on how futurism has been pretty good in predicting technological developments, but pretty poor in foreseeing the grand social changes of our times:

And on and on. If futurists have become almost too good at technological foresight, we remain woefully primitive in our abilities to examine and forecast changes to cultural, political, and social dynamics.

Enjoyed the piece and am quite sympathetic. Definitely worth your time. Being employed by the American military-industrial complex I’ve sort of been at the frontline of this overriding preoccupation with technology and inability to deal with large scale social dynamics. Increasingly I’m coming to the conclusion that understanding and shaping social dynamics are central to solving the world’s problems. Technological means will be a part, maybe even the key catalyst, they’re not a sufficient condition unto themselves.

We’ve demonstrated we can come up with a tech doodad to solve many complex issues and systematic processes for generating those doodads. But thanks to globalization and hyper-connectivity, maximizing the impact is always a challenge of collective action bordering on a wicked problem.

Maybe this is an indicator of why I enjoy William Gibson so much.

P. S. Even though this is my first link, Cascio’s Open The Future is a good thought provoking read on a continuous basis.


Bob Wyman and FrackIM

A long time ago there was an interesting little startup named PubSub that provided prospective search. Back in my past life, I definitely had a thing for PubSub. And I especially enjoyed Bob Wyman’s blogging before he finally tailed off after PubSub crashed.

I had heard that he’d migrated to Google, but wasn’t really sure what he was up to. Can’t say as I do know now, but Wyman popped up recently in my feeds, experimenting of course with prospective search. Apparently Google’s AppEngine now supports a Prospective Search Service. Wyman has glued in some external real-time streaming services as sources, the Prospective Search Service for matching/filtering (which he knows a lot about), and instant messaging through XMPP to deliver matches to clients. The result is FrackIM.

Neat! And the possibility of following Wyman might be enough to get me to actually use Google+.

A lot has changed technically on the Web since PubSub went off the air. Wonder if they were just ahead of their time, which means the core ideas could make a comeback. Then again, there never turned out be much of a business model for these services.


Maui and Wikipedia Miner

Link parkin’:

maui:

Maui automatically identifies main topics in text documents. Depending on the task, topics are tags, keywords, keyphrases, vocabulary terms, descriptors, index terms or titles of Wikipedia articles.

Wikipedia Miner:

WikipediaMiner is a toolkit for tapping the rich semantics encoded within Wikipedia.

It makes it easy to integrate Wikipedia’s knowledge into your own applications, by:

  • providing simplified, object-oriented access to Wikipedia’s structure and content.
  • measuring how terms and concepts in Wikipedia are connected to each other.
  • detecting and disambiguating Wikipedia topics when they are mentioned in documents.

Bonus, Chromium Compact Language Detector:

Wonderfully, Google has open-sourced most of Chrome’s source code, including the embedded CLD (Compact Language Detector) library that’s used to detect the language of any UTF-8 encoded content. It looks like CLD was extracted from the language detection library used in Google’s toolbar.

It turns out the CLD part of the Chromium source tree is nicely standalone, so I pulled it out into a new separate Google code project, making it possible to use CLD directly from any C++ code.


SOPA Dope

As someone who’s been on the Internet for closing in on 25 years, I am remiss in not seriously tracking the Stop Online Piracy Act, or SOPA (and PIPA), drama. Especially so since I’ve seen and personally benefited from the explosive expansion of the global network of networks. Unfortunately, I just didn’t have time to contemplate blacking out Mass Programming Resistance.

So as penance, I plan to call my Congress-critters offices and give ‘em 2 cents of opposition to SOPA. Senator Warner and Congressman Wolf seem to be on the right side of things, but a little reinforcement is a good thing. Don’t know about Senator Webb though. Given the amount of Internet and Web infrastructure and business based in Northern Virginia, I’d think they’d all be against something that has very clear potential to kill jobs in Virginia.

I like Andy Baio’s explanation of how SOPA would damage his longstanding, and highly entertaining, site. Hat tip also to Baio for this Khan Academy link explaining the downsides of SOPA/PIPA for the more visually inclined. I also like Tim O’Reilly’s reasoning against SOPA as captured by Colleen Taylor.


Wizards Worst? Whoas!

Wizards Logo 2012 Over at The Basketball Jones, they’re coming to the same realization as me that the 2011-2012 Washington Wizards have a distinct chance of being the worst team ever in the history of the NBA. If it happens there will be an asterisk because of the lockout shortened season, but the Wizards are on pace for the worst season winning percentage by a lot!

And TBJ posted this conclusion before the pathetic MLK Day performance by the Wiz, featuring Javale McGee embarrassing himself not once, but twice for a 1 and 11 team.

I don’t know what it was like on other historically awful teams, but the chemistry here in DC is just pathetic. With the local beat writers essentially saying everyone in the organization has quit, including the coach, herewith are the folks who need to go with a quickness:

  • Aundray Blatch. In the words of Mike Singletary, I need winners. Blatch may be the antithesis of a winner. The poster child for what’s wrong with the Wiz.

  • Ernie Grunfeld. The GM who’s just made too many spectacularly bad moves. Drafts, trades, free agent signings, re-signings, discipline. You name it, Grunfeld has blown it. Every day he’s with the organization, I honestly fear that he can make this team worse.

  • Flip Saunders. I know Flip got a raw deal and is a better coach than this, but you can’t fire all the players. The players have tuned him out and he may have tuned them out. There’s a rookie or two who might be salvageable if they aren’t too corrupted by this season and they need a coach who can insulate them.

  • Nick Young. Young seems to get a bit of a pass because of his buffoonish, laid back, SoCal attitude. On occasion he can fill it, but somebody’s gotta get the points on a bad team.

  • Javale McGee. I was about to cut him some slack for improved effort and execution this year, until that last performance. Please sweep away the last remnants of the Gilbert Arenas era.

The only player that’s a must keep is John Wall, and this season will definitely retard his progress. Hopefully, Chris Singleton can struggle through and Jan Vessely show some development. One serious challenge is that there is effectively no cheap Plan B for GM and/or Coach. You’d have to overpay, in dollars and power, for someone high profile and a stopgap would be pointless. Meanwhile, owner Ted Leonsis is thrifty with his money, read doesn’t like paying for fired staff.

Other than that, looking forward to a top 3 pick again this year. And maybe a choice deal on partial season ticket plans for next season.


Drum Major Instinct

MLK Image In case anyone wanders by this blog and is curious, let me make it completely obvious. I’m Black or African-American if you prefer. In the United States, we have a federal holiday for Dr. Martin Luther King, Jr. who was instrumental in the American Civil Rights movement, which led to many of the liberties that I personally enjoyed this very day. I typically haven’t mentioned it when I’m blogging, but his holiday is an important day in my book.

I was all set to post about the copious first world, 1%er, global elite, advantages that have been available to me thanks to Dr. King’s effort (along with a whole bunch of other people, including my parents). Heck, when I was born, a black President of the United States was unthinkable. There’s a lot wrong with the US of A, but we’ve come a long way, and lit quite a path for other nations, even in my short existence.

Then Hendrik Hertzberg’s excellent, in-depth post on the inaccuracy of the quote inscribed on the King monument came across my feed. Even Hertzberg’s little taste of King’s sermon to Ebenezer Baptist Church, the Drum Major Instinct sermon, set me straight:

The drum major instinct is real. (Yes!) And you know what else it causes to happen? It often causes us to live above our means. (Make it plain!) It’s nothing but the drum major instinct. Do you ever see people buy cars that they can’t even begin to buy in terms of their income? (Amen!) [laughter] You’ve seen people riding around in Cadillacs and Chryslers who don’t earn enough to have a good T-Model Ford. (Make it plain!) But it feeds a repressed ego. …

The drum major instinct can lead to exclusivism in one’s thinking and can lead one to feel that because he has some training, he’s a little better than that person who doesn’t have it. Or because he has some economic security, that he’s a little better than that person who doesn’t have it. And that’s the uncontrolled, perverted use of the drum major instinct.

Those snippets can’t do the whole 5000 words justice, so make sure to read it yourself. But suffice it to say, while I am very thankful for all I have, and all those who sacrificed before me, I need to stay humble and think of ways I can personally do justice to the conclusion of his sermon.

P.S. The King family aggressively (and legally) protects copyright held on Dr. King’s image. The above is freely available according to Wikipedia.


That Particular Flavor

Link parkin’: William Gibson collected some of his non-fiction into a book, Distrust That Particular Flavor.


Some Book Reviews

… in areas of recent interest to me.

John D. Cook on: R in Action and The Art of R Programming.

In short, The Art of R Programming is for programmers, R in Action is for data analysts, and most other R books I’ve seen are for statisticians. Of course a typical R user is to some extent a programmer, an analyst, and a statistician. But this comparison gives you some idea which book you might want to reach for depending on which hat you’re wearing at the moment.

Gaël Varoquaux on: NumPy 1.5 Beginner’s Guide

The book actually covers more than only numpy: it is a full introduction to numerical computing with Python.


@handles #everywhere

I’ve become mildly amused by the penetration of Twitter usage throughout mainstream media. To my eye there’s been an especially urgent burst in the sports world. ESPN has gone overboard with @ handles for every on-air personality, a # hashtag for every show, and constant urges for viewers to participate.

Then there’s Shaq on TNT.

Oddly in social networking (Twitter, Facebook, Google+) it feels like search circa 1995. There’s a solid technology baseline but disruption seems right around the corner.


Moments in Noise

Link parkin’: MeFite hippybear unloads a comprehensive history of The Art of Noise. Even though I was there, nothing left to add, other than BeatBox (Diversion One)? You better recognize!


Progressive Enhancement Subtlty

Previously I had posted on how Bagcheck uses the CLI to practice progressive enhancement. In my post I implied that the CLI was used to build a system’s API. That was incorrect.

The CLI is not part of the API. The CLI is the first, minimalist interface. It does help with automation, testing, designing the API, and prototyping. But it’s still just one interface step on the path to the whizzy web UI.

A subtle but important point.


Wiz Win! Wiz Win!

Wizards Logo 2012 Our long Nation’s Capital nightmare is over as the Washington Wizards have defeated the Toronto Raptors 93 to 78. The Wiz were the last NBA team to get a win this season, and ended an eight game losing streak.

As bad as the Wizards are, I’m perversely jazzed at potentially getting in on the ground floor of something that will eventually be on the way up. This may arguably be the worst pro team I’ve ever tracked in my 30+ years of following the NBA. Even worse than the 1998 post-Jordan Chicago Bulls. Give Washington a few more lottery picks, dump the riff raff, and playoffs “here we come again”. The Wiz are my little side fling, to complement my mainstay this year the 2012 now-Derrick Rose Bulls.

The Tebowcracy, however, continues.


Congratulations BCS!

You’ve successfully managed to make me not care about the NCAA College Football post-season. To be honest I was way more excited about the regular season Alabama-LSU matchup then the BCS Championship Game. The former was appointment TV. The latter was something I got around to after reading to My Little Guy (TM), putting him to bed, and getting a bite to eat. Missed one field goal. Oh well, there were plenty more to go around.

Might be time to put old Brent Musberger out to pasture. Between Honey Badger! utterances and mocking the LSU offense getting across the fifty yard line, I was close to done.

As to any other bowl fare, I mostly snoozed my way through Cal’s defeat in the Holiday Bowl. When I was awake, there was a lot of Angry Birds Rio playing. The Rose Bowl got cursory coverage as I was out at a Chicago bar visiting a friend. Sound down and reminiscing about the old neighborhood.

At this point, I’m not even sure a playoff could bring back my enthusiasm. That’s how jaded I am about college sports. Oh well, there’s always The Big Game.

And apparently quite a few other folks are voting with their clickers. Take that Self-Proclaimed Worldwide Leader in Sports™!

Big Game in October 2012?! WTF!


iOS Dock Folders

[Folder In Dock][1]

Didn’t know you could put folders of apps in your iOS dock. Very handy. Learn something new every day.


The Losers, Guilty Pleasure

The Losers Poster Small

I’m always suspicious of comic book conversions to movies, but The Losers was surprisingly entertaining. Of course it’s not great, otherwise I wouldn’t feel guilty, but I find myself stopping to watch it whenever it’s on HBO. I even dialed it up through on-demand (free) over Christmas when there was nothing else on.

There’s a number of good elements, to wit:

  • First and foremost, Jason Patric as Max, a pretty good villain. Appropriately bombastic for a B movie bad guy, and clearly on the wrong side in a big way. However, he’s quite funny hamming it up. “Clay and his unit? That sounds like a porno, Wade.”

  • Wade, as Max’s put upon sidekick, provides a pretty good straight man.

  • Overall, The Losers has a wry sense of humor and doesn’t take itself too seriously. It’s a B-movie with an A- budget, and revels in its niche. Witness The Petunia’s soccer match at the end of the game.

  • Don’t know why, but I actually bought the camaraderie amongst Clay’s team. Pooch, Jensen, and Cougar seem like real buds.

  • Idris Elba, acting the hell out of Roque.

  • Zoë Saldana is easy on the eyes, although a bit thin for my tastes.

  • The flick moves along at a good clip, with copious bursts of action. A nice, tight 97 minutes makes for a quick fast ride.

My only real demerit is that The Losers is a tad mean spirited. Blowing up kids is pretty low. The shooting of the umbrella assistant was completely gratuitous. And there’s no excuse for accidentally throwing scientists off of tall buildings.


iOS WP Markdown

I really wish the WordPress iOS app supported Markdown a lot better. I’m starting to enjoy posting from the couch, but the slight impedance mismatch is fairly irritating.


An MIT Farewell

From Hacker News, it was with a little trepidation that I followed the link to Evan Jones Farewell to MIT. I often find personal stories highlighted by the HNers to reflect outlier experiences with little insight. I’m always curious about varied perspectives on my alma mater, but I wasn’t holding out high hopes.

Turns out Jones had a nicely written and thought out look back on his time at the ‘Tute. First, he did graduate with a Doctorate, no small feat. Second, he learned that he wasn’t cut out to be a professor. From personal experience, way better to learn this earlier rather than later. Finally, he had this really poignant observation about graduate life:

I also learned how to do academic research. While my publication record is not stellar, it is also not terrible. Many people have written better graduate school advice than I could, so I won’t try. However, there is one important piece of advice I wish I could give to my past self, when I started my PhD: Actively collaborate with others.

Wish I would have learned that lesson earlier myself. I’ll add that this is also quite applicable to faculty members and industrial researchers. For academics it’s hard because of credit assignment and promotion. Still that was the most rewarding part of my former professorial life. Industrial researchers can be forced by management to “work together” but it’s actually pretty easy to run off in a corner and hide.

That’s why tribal development is one of my 2012 resolutions. Now if it were only easy to find top notch collaborators!!


Data Agility

In working with some big piles of data at work, I’m learning some lessons that veterans like Matthew Hurst have already internalized:

Data agility is achieved when the system is designed to either be independent of certain types of qualities of the input data, or when there are well defined processes, tests and analytical tools that radically reduce the time from identifying a new data source to shipping it in production.

System agility is not data agility, and aiming for data agility requires an upfront investment in tools specifically for that purpose.

I came to a similar mind about data agility when I realized that much of my recent programming was well nigh trivial. How to organize bits of simple code into a robust, manageable pipeline lit up the old brain cells though. Also, data gardening and grooming is a key part of that pipeline.


Foursquare Execution

Link parkin’: Foursquare Today’s Best Executing Startup

“I’m all for it.”

Tampa Bay Buccaneers (National Football League) Head Coach John McKay on his team’s offensive execution.

In The Next Data Challenge I mentioned that there’s some interesting data collection experiments to be run against newer social media systems like Foursquare. Conveniently, Anil Dash did a deep dive into why he thinks the NYC based startup is at the top of the tech game. I don’t use Foursquare at all, so can’t speak to the piece’s veracity, but it’s a good read. And points to why I think Foursquare is worth studying closely.


For The Birds

Angry Birds Rio Logo

I’d held out for quite a while against the Angry Birds world domination. My wife succumbed last Christmas when I got her an iPad. For whatever reason, Angry Birds Rio was one of the few games she downloaded. Of course My Little Guy (™) took to it like a fish, even though he wasn’t all that proficient. He’s got fine motor skill issues, but he’s young and highly enthusiastic.

So now I’m trying to turn my dormant iPod Touch into My Little Guy’s (™) iOS device. That was the disease vector leading to my infection. I just had to try the game out.

And my old game obsession tendencies from way back in the undergrad days came back to life. I basically started playing Angry Birds Rio late last Wednesday, Dec 28th. I was done with all the levels by Sunday afternoon, January 1st. I struggled with a few screens, had to go to to the Interwebs to cheat on one (it was 3AM and I crumbled in a moment of weakness), and surfaced a gaming intensity I hadn’t seen in a long time.

Angry Birds Rio is addictive in that it’s a series of rich micro-challenges. Just complex enough to put the braincells to work, but completion is always tantalizingly close. “Just one more stage,” becomes the zombielike mantra. With a healthy dose of, “I’ll finish it in one shot!” Plus the overall game design, mechanics, and user experience are brilliant from a cognitive perspective. While subtly addictive, the game is really fun to play.

I will be staying very far away from the original Angry Birds.


The Next Data Challenge

Now that I’ve honed my Twitter data collection skills a bit, a couple of new ones are coming to mind. Interestingly, I like starting them as home hacking projects and then transferring the experience to work as needed.

First, collecting a million Tweets per day using the Streaming API doesn’t seem completely unreasonable. Now I don’t have enough home storage to handle that amount of data but I do have Amazon S3. I was getting hung up on having continuous query and analysis capabilities available. That would reauire an expensive VPS in the cloud or another machine to worry about in the basement. But simply storing a small window of the data on a cheap VPS, pushing the data into S3, and then batch processing with Elastic MapReduce is eminently feasible. Probably good for the resume too. And with a little automation this can operate while I’m sleeping and run for days at a time. That quickly means tens to hundreds of millions of tweets. You’re talking real Big Data at that point.

Second, I’m still not seeing any interesting data collection experiments from systems like Instagram or FourSquare. Maybe I’m looking in the wrong places, but seems like an opportunity to me.

Third, adaptive query specification for the Streaming API. Currently all my collection just sets up a bunch of geo boundaries and leaves them alone. Two issues here. Dynamic determination of the queries and dynamic update of the query spec. The latter isn’t too hard but the former is open territory.


Be It Resolved

I’m generally not big on announcing resolutions, although I have done so in the past. Let’s give it a shot for 2012.

  • Improve Physical Health. The holidays definitely have me backsliding but at least I’ve got the basics of exercise and weight loss going. Main goals are to lose about 20 pounds and play some pickup ultimate without embarrassing myself. No injuries. Stretch goal is to play organized ultimate like a league or hat tournament. Also, improve the nutritional intake. More veggies.

  • Maintain Financial Health. A little over 3 years ago, I was barely above water on a condo I wasn’t living in. I also had a pretty big pile of personal debt. That dispiriting conversation with a parent about a loan was looming. Luckily I closed on a sale just before John McCain announced he was suspending his campaign and wrecked the economy. Fiscal management took over at my household and now I’m in pretty good shape. No going back in 2012. Avoid lifestyle creep. Pay everything on time, no fees or penalties. 10% more of take home pay stashed into retirement instruments. Double my charitable contributions.

  • Shine The Skillz. The Mad Data Scientist exercises at home and work have exercised old programming muscles. In 2012, I’m going to work hard on regular hacking activity and getting some more professional interaction and training. I’m looking to attend three programmer events where I get in a workshop, class, or coding sprint.

  • Expand The Network. Once a month either get back in touch with an old professional contact or make a new one. Special focus on people local to the DC area. Join the MIT and UC Berkeley alumni clubs of DC and attend 6 events total. Update resume. Flesh out LinkedIn profile.

  • Get Rid of Stuff. I’ve got stuff in a storage site that I haven’t visited in over 3 years. It can’t be that important so time to go. Also, the wardrobe needs a major flush. Clearing half of that space is the target.

  • Build A Tribe. Over the last few years I’ve been leaning back and trying to be good teammate. This past year I’m starting to see technical challenges and opportunities that really excite me. But they all involve pulling people together to build something bigger than any one individual. This actually goes against my general nature but it’s the next meaningful life step. A high functioning team of three to five people would be an achievement.

  • Less Watch, More Do. Cut down on the number of weekend days I completely lose laying on the couch watching sports. More days with more hacking. More time spent doing real activities with My Little Guy (TM).

And of course keep on posting!!


Tapbots Love

Tweetbot Logo Upon John Gruber’s recommendation and subsequent eulogy for the Twitter client, I switched over to Tweetbot on my iPhone. Definitely enjoying it and wondering what took me so long.

Since I’m also trying to shed a few pounds, I took a peek at Weightbot and liked what I saw. Added to the iPhone as well. Tapbots appears to be a high quality outfit.


Riak Tradeoffs

Riak Logo What with the piles of data I have to process at work, I keep an eye out on the various storage, indexing, and query technologies. One product, Riak, looks good but hasn’t quite fit my use cases. There’s a nice overview on InfoQ, with Andy Gross and Mark Phillips of Basho Technlogies, on the tradeoffs that Riak provides.

The big downside for me has been the need for relatively sophisticated ad hoc querying. The Basho team points out that Riak isn’t particularly good for that, being more of a building block towards that capability. The high availability, horizontal scalability, and good performance on greater than main memory working sets are attractive features though.

May have to run some experiments at work just to baseline the Riak potential.


Backlog

WordPress Logo Since I’m not a deep or prolific blogger, I have to work to maintain continuous output. I don’t know how other bloggers do it, but I’m learning that maintaining a backlog really comes in handy to get over those spots where you can’t squeeze out enough time to compose a post. With a good backlog, a great blogging CMS with a decent mobile interface or app, and a smartphone, one can keep that streak alive.

The only issue I’m having with the WordPress iOS App is getting the publication time right. There needs to be a “set time to now and publish” button.


Deep Gibson

Zero History Penguin Cover The interview of William Gibson, in The Paris Review, was much deeper about his personal life than I expected. There’s a lot of depth on his time on Wytheville, Virginia and how it influenced his conceptions of science fiction and writing. Also his draft-era Vietnam angst and transition to Canada. Didn’t know he had a wife and son.

In addition, I’ve always thought of the nominal Bigend Trilogy (Pattern Recognition, Spook Country, Zero History) as a science fiction series, but the books are really contemporary thrillers. As Gibson puts it, there’s just enough to make them work like science fiction.

And the story of his first paid publication is priceless. A long read but well worth the time invested.

With the start of a new year, and the anniversary of my Kindle ownership, I’m thinking of rereading the Bigend trilogy in its entirety. I’d like to do it on the Kindle, but the prices of the electronic editions is mildly daunting.

Apropos of nothing, according to Wikipedia, Bigend was born the same year I was.


Ghosts and Shadows

Ghost Protocol Poster As promised, I managed to take a little bit of holiday time and catch a few movies.

Sherlock Holmes: A Game of Shadows was pretty much as anticipated. A fun little romp, historically set in Victorian Europe. The big delta over the first Sherlock Holmes is the full introduction of Professor Moriarty, a worthwhile adversary and quality villain. I found the plot engaging and it moved along at a quick pace, never dragging. Occasionally the film gets a little too caught up in bullet time slow motion, but it’s not a major flaw.

An upside of seeing it in the theater is that, as opposed to HDTV at home, Robert Downey Jr. doesn’t come across like he’s mumbling. Although, soft spoken, you can clearly make out what Holmes is saying. More Kelly Reilly please.

The major downside is that I think I got jacked by a 3D bulb on a 2D projection. Roger Ebert clearly outlines the issues but the dang film looked like it was completely shot at midnight. I like the Cobb 12 Theater, but if it happens again I’ll have to ask for my money back.

On a brighter note, Mission: Impossible Ghost Protocol showed at my nearby IMAX (real IMAX) theater. Memo to self, get to the theater earlier so you don’t have to sit in the second row, leading to a crick in the neck. Other than that, paying some extra shekels for the mega-screen was worth it. There are some scenes involving the Burj Khalifa where my heart literally leapt into my throat. Could have done with without the sandstorm chase scene, but that was made up for by the beautifully gigantic Paula Patton.

This Mission: Impossible was noticeably lighter than the previous two and more human. In somewhat of a return to the television roots, the film relies much less on gadgetry, and more on social engineering. Although I still love Philip Seymour Hoffman’s villain from the last edition. Heck, the iconic rubber masks from the first film are pretty much put out to pasture. And a huge dose of Simon Pegg added plenty of comedic touches.

Bonus. Since my IMAX is real IMAX, we didn’t get any trailers. But that was made up for by six minutes of The Dark Knight Rises prologue. Said prologue deserves a post of its own, but suffice it to say I’m now really looking forward to this film next year.

I can unreservedly recommend both Sherlock Holmes and Mission: Impossible for at least a matinee screening. If you have to pay full fare, you won’t feel ripped off although maybe a bit of buyer’s remorse for overpaying. They’l both make great movies on the home theater.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.