home ¦ Archives ¦ Atom ¦ RSS > Category: Uncategorized

Blogaversary, Macaversary 4.0

Not much to say here. Trucking along at full speed into year five of this blog adventure. I have grown comfortable without a grand plan and am enjoying just keeping the ride going.

Happiness is still a warm MacBook, maybe not so new. Think I’ll hold off on a laptop upgrade until something comes along a little closer to the sweet spot. Plus I’ve got some other clear cut need purchases this year, and I definitely want an iPhone 5. Besides, the new toy has been plenty of upgrade for one year.


Farewell Pushcart Queen

Breaking my embargo on eulogies, I’m shamelessly ripping off my title from the MetaFilter post that put me on notice of Jean Merrill’s passing:

The Pushcart War was one of those childhood books which changed my life forever, teaching me the importance of activism and civic participation. Merrill’s stories also show how much fun organising can be. We have lost a great and creative soul, someone capable of introducing young people to issues that matter with passion and a twinkle in her eye.

Along with The Phantom Tollbooth, The Pushcart War marked my transition to “serious” reader. Yeah, I proceeded to still read a bunch of junk. But I forever knew there would be stuff out there with more heft and style. There did exist a higher standard to seek out and aspire to. I’m still managing to work towards that standard, bit by bit.

Godspeed, kind lady.


The Twitter Comedy Factory

Here’s a study I’d be interested in seeing just on general interest. The usage of Twitter as a comedy platform. Seems like there’s a @FakeX account for any value of X and half a million “@!*$ My Whatever Says”. Is this a new comedic venue? How big is it in terms of creators, participants, and dollars? And put it all in a comedy historic context for me.

Bonus points if you’re funny.

Sigh. So many ideas. So little time.


SportsNerd, A Sports Tech Vertical

Bleacher Report LogoThis is just off the cuff daydreaming with about zero background research. It’s inspired by the recent purchase of Bleacher Report for just a little south of $200 million and The Art of Manliness’ two part guide on …Starting a Side Hustle.

Assertion: there is no part of the modern worldwide sports industry that is not being heavily technology influence. Athletic performance? Check! Financials (including gambling)? Check! Media? Check! Analysis? Check!

Seems like a Web publication that was hyper-focused on the varied intersections of sports and tech might have a chance. Read as Gizmodo or Engadget but always coming with a sports angle. Broad situational awareness of tech impact on sports, augmented with deep dives, interviews, and even opinion pieces. You could throw in a little humor and still not fall into the T and A tarpit.

The neat trick is that you could probably get something rolling on the real cheap as a simple linkblog. For bonus points algorithmically trawl the intarwebs to decrease/scale human involvement. The side hustle is to grow it into something on the order of Daring Fireball or kottke.org. Individual voice, somewhat broad taste, but with a clear content focus. If there’s any traction, grab a buddy or two, or find some enthusiastic hobbyists to build out in the direction of a serious web publishing business. Of course the really hard part is marketing to a) build out the audience, and b) generate revenue probably through advertising, but maybe some other creative financing.

Definitely not an unheard of story, but it’s hard to believe that someone hasn’t already tried this and failed miserably, or already been swallowed up by one of the big boys.

Anyway, yet another Side Project ™ to take on in my Copious Spare Time ™.


GPS: A Graph Processing System

Link parkin’: GPS: A Graph Processing System

GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon’s EC2.

We have completed an initial version of GPS, which is available to download. We have run GPS on up to 100 Amazon EC2 large instances and on graphs of up to 250 million vertices and 10 billion edges.

Okay, it’s from Stanford, which raises suspicion (Go Bears!), but 10 billion edges is no joke.


iOS XScreenSaver

Link parkin’: XScreenSaver for iOS. From the Wikipedia entry for the UNIX classic

XScreenSaver is a collection of about two hundred free screensavers for Unix, Mac OS X and iOS. It was created by Jamie Zawinski in 1992 and is still maintained by him.

Amazing that such an old piece of kit can migrate to such modern platforms.


WordPress iOS 3.1

There’s a new release of the WordPress App for iOS.

This is a big one! The WordPress for iOS dev team is excited to announce that version 3.1 is now available in the App Store. In this update you’ll find a redesign for both iPhone and iPad, along with loads of new features!

Thank goodness. The current editing UI on the iPad is driving me up a wall. Too many switches out of editing mode to do simple things. Hope the new version is a bit more efficient.


Titan Graph Performance

I’ve mentioned the Titan graph store before. Aurelius, the company building Titan, has posted a performance study of the DB that heavily relied on Amazon Web Services:

The presentation to follow discusses the simulation’s social graph structure, the types of processes executed on that structure, and the various runtime analyses of those processes under normal and peak load. The presentation concludes with a discussion of the Amazon EC2 cluster architecture used and the associated costs of running that architecture in a production environment. In short summary, Titan performs well under substantial load with a relatively inexpensive cluster and as such, is capable of backing online services requiring real-time Big Graph Data.

The kicker is that their analysis of one year’s AWS costs to support a half-million users is well within an early startup’s wheelhouse. Also, their modeling and sim approach is pretty interesting. They’ve probably simplified the discussion approach for presentation, but there’s a bunch of approaches in the CS literature that are also applicable.


Getting Jacked

Mat Honan got jacked. Kiddies wrecked his online presence for the lulz, exposing some severe security deficiencies in Amazon and Apple security procedures. It’s a bit of a cause célèbre, invoking a lot of suggestions on how to safeguard one’s self.

Like Tim Bray, I heart 1Password but need to do more, especially since I’m a heavy Amazon user. Following Matt Cutts’ advice, I’ll be turning on 2 Factor Authentication. Wired also has a number of solid suggestions, along with lots of other info kicking across the Web. Do yourself a favor and partake of the wisdom.

And I always thought that remote wipe capability was a bit of a hinky idea.


The CS CEO

I’m really surprised to hear this, almost to the point of fisking it out, but I trust Greg Linden. Marissa Mayer is the first person with an advanced CS degree hired into a Fortune 500 CEO position.

Marissa Mayer as CEO of Yahoo may be a test of a new style of executive leadership, the optimizing CEO.

She is not the first computer scientist to lead a major company, but she is the first computer scientist (MSCS or higher) hired in as CEO to a Fortune 500 company. Many computer scientists view everything as an optimization problem. People, work, politics, life, everything is a search (often of a dynamic space) to find a maximum near the global maximum.

On the other hand, I’m not that surprised in that there are only so many of those spots, they haven’t been open to everyone for much of history, and they typically skew towards MBAs. The qualifier, “hired in”, also makes a big difference. There have been a number of advanced CS degree CEOS that have grown companies into Fortune 500 members. This may be the first one that has to inject (reinject?) a CS culture into such a large corporation.

And I agree with Greg that a systems optimization perspective in a Fortune 500 CEO will be an interesting experiment.

Lastly, never really thought of Yahoo! as a Fortune 500 company.


Trident, Realtime Big Data

Nathan Marz is still a badass, despite being swallowed up by Twitter:

Trident is a new high-level abstraction for doing realtime computing on top of Twitter Storm, available in Storm 0.8.0 (released today). It allows you to seamlessly mix high throughput (millions of messages per second), stateful stream processing with low latency distributed querying. If you’re familiar with high level batch processing tools like Pig or Cascading, the concepts of Trident will be very familiar - Trident has joins, aggregations, grouping, functions, and filters. In addition to these, Trident adds primitives for doing stateful, incremental processing on top of any database or persistence store. Trident has consistent, exactly-once semantics, so it is easy to reason about Trident topologies.

We’re really excited about Trident and believe it is a major step forward in Big Data processing. It builds upon Storm’s foundation to make realtime computation as easy as batch computation.

I’m sure Trident has some sharp pointy edges, but it also looks like a bag of fun.


EPL ’12-13 In View

Premier League Logo Holy smokes! The new English Premier League campaign kicks off, two weeks from yesterday. It feels like Manchester City just got done pulling victory from the jaws of defeat.

I’ll have to check the summer transfers and evaluate who will win my rooting interest for the season. Betting folks will note that wagering against my allegiance may be a good idea. Arsenal and Chelsea have already rued my attention. Can’t really fancy rooting for any Mancunians, other than the Big Four and the Spurs, who might realistically challenge for the title?


Why HDFS

Hadoop Logo Charles Zedlewski makes an interesting analogy between HDFS, the Hadoop Distributed File System, and the Linux operating system. Stealing the punchline:

It’s rare when you get to see history repeat itself so completely as it is with HDFS. Today HDFS may not be the best filesystem for content addressable storage or nearline archive. But then 15 years ago who would have thought Linux would find its way into laptops, routers, mobile phones and airport kiosks?

Linux drew us the map. The smart money is already following it.

We’ll see how it plays out, but given the entrenched nature of HDFS he might be right. HDFS’ open source nature, and maybe more importantly community, means just about any good distributed file system idea can be quickly embraced and extended.

There is probably one area where HDFS could be radically updated or face displacement. Real-time streaming datasets don’t fit the HDFS model particularly well. Doesn’t mean someone smart can’t come along and fix it up.

Zedlewski also heavily invokes the nice support for Map/Reduce processing that HDFS provides. Map/Reduce is clearly successful, but these other processing demands may eventually lead to other programming models that fit less well with HDFS.

But I’m of a mind that Zedlewski is mostly right, and that HDFS is a nice solid foundation to build on going forward.


Building REST APIs with Python

In my Copious Spare Time ™, I’m thinking about how one can rapidly build, lightweight, fast REST APIs using Python. I may revisit Django approaches again, but the heavy dependency on an RDBMS and a model-based Object-Relational Manager (ORM) makes it feel like I’m fighting against the toolkit. The key issue is that to get performance for some aspects, I’m anticipating the need for other data stores such as a full-text index or a NoSQL repository. Thinking one of the Python web micro-frameworks, possibly Flask, might be the right fit.

Wow! It’s been a long time since I talked about Redis. It’s up to version 2.4.x now.


Mac OS X Tweetbot

Tweetbot Logo Tapbots released an alpha version of Tweetbot for Mac OS X. Figured I’d give it a whirl since I use Tweetbot so much on the iPhone and iPad. Like the familiar look and haven’t run across any bugs so far. Here’s hoping the Tapbots have success with this product.


Django Project Start

Okay, I learned a lot from this post, Starting a Django Project the Right Way:

One of the things I wish I had known when starting my Django project for IllestRhyme was “How do I start a real Django project”. As in, one that’s actually going to be used and developed more, not the toy project from the (admittedly execellent) Django documentation.

Having just gone through this process again for my new site, I wanted to share the knowledge I’ve gained about how to properly start a project in Django.

In addition, the comments are well worth reading.

Via Nat Torkington


Roles of a Data Science Team

A Gnip Data Story, a.k.a interview, with bitly’s Hilary Mason provides the most substantive detail I’ve heard about what her team actually does:

My team plays a few roles within the company. We handle the business analytics, which can be answering very simple questions like, “How many new URLs did we see yesterday?” to complex questions like, “How do we value a URL being clicked from platform X vs platform Y over time?”.

… In summary, my team is responsible for pushing the boundaries of where bitly can go. It’s fun.

There’s plenty more where that came from.

Good stuff. Just always been curious.


Don’t Tempt Me

It’s a sad statement of mega-corporation life that while the tumblr for Today’s Corporate Meeting Challenge is really funny, I am actually tempted to try and rise to the challenge.

Even sadder is that I’d be lucky to actually be called out in any of the numerous meetings I’m asked, nay required, to attend. Half of these would just go over people’s heads and they’d just forge on.

There should be a separate category for teleconferences. I’m undecided if these monstrosities should garner bonus points for laugh multiplication, or be considered the putting green. Seriously, you can always say the most profoundly stupid things in a telecon if you follow it up with “Sorry. Thought I was on mute.”


Python Modules of Unusual Usage

Getting a lot of mileage out of:

  • requests. Wasn’t quite ready to believe the hype, but it does make executing HTTP requests a lot more humane
  • ConfigObj. Makes parsing Windows INI files as about as convenient as possible
  • argparse. Once you internalize the Tao of this module, combine with cliff, to make command-line Swiss army chainsaws.

Computational Politics

Choice quote from Nat Torkington:

If von Clauswitz were around today, he’d say drones are the computation of politics by other means.


+1 AtBat12

MLB’s AtBat 2012 is an iPad and iPhone app. I’ve been enjoying it on my iPad but that device is currently charging. So I fired up the iPhone version, hit the “Restore” button, logged into iTunes, and boom! Streaming ’Nats audio.

Nice work MLB!


Peet! Say It Ain’t So

As someone who’s first good cup of copy was brewed from Peet’s beans I’m sad to see the Berkeley based roaster absorbed into a massive corporate maw. They held out much longer than I thought. Fond memories of the eclectic customer stream going through the tiny ur-Peet’s at Shattuck and Rose.


Scary Formats

You know you’ve been hacking when you start to both hate and appreciate the intricacies of csv file field escaping because you’ve come to both hate and appreciate the intricacies of JSON’s UTF-8 encoding. Kids, don’t try to put a data format that’s hard to escape by itself inside another format that has, at best, loosely defined escaping mechanisms.

Two great tastes that taste horrible together.

But I did it all for the data, the data.


AtBat 2012

In an effort to cut way back on the Couch Potato action, I’m trying for more listening than watching. The iPad makes for a good music media player, but sports programming is the critical factor. I can cut back on the sports “news” and mindless chatter through self-discipline, but live games are my Kryptonite. At least when I’m listening to a live broadcast I can multitask on things like writing code. TV just pins me slug like on the couch.

So I’m giving MLB AtBat 12 a run (even as I post this!) as it allows you to listen to every Major League Baseball radio broadcast as a digital stream. Pair it with some Bluetooth speakers and you’ve got a modern day Rube Goldberg radio, complete with Retina Display!

So far the experience has been great, and the price right at $2.99 for my trial month. While, MLB seems to have their act together, we’ll see how things fare during football, football, basketball, and hockey seasons. My impression is that the NFL, NCAA, EPL, NBA, and, NHL aren’t quite at the same level of sophistication. But at least I can build up the the anti-TV habit.


Just Works

I had to take my wife to a doctor’s appointment today. As is typical, there was a lot of hurry up and wait. I had both my work laptop and iPad with me. The iPad is a Verizon LTE model, which means they throw in the mobile hotspot capability for no extra charge. Hadn’t used it yet so I figured I’d give it a whirl to see how painful the process could be, just for future reference.

Turned the hotspot on on the iPad. Connected the laptop to the appropriate network name. And it just worked! I was even pleasantly surprised by the responsiveness of the network. Didn’t feel cellular at all.

Good to know, especially when routing around ridiculous hotel rates for Wi-Fi.


Teach Data Science

Link parkin’: Teach Data Science

This is the companion site to the electronic textbook, Introduction to Data Science, by Jeffrey Stanton. This book provides non-technical readers with a gentle introduction to essential concepts and activities of data science. For more technical readers, the book provides explanations and code for a range of interesting applications using the open source R language for statistical computing and graphics.


Prismatic, +1, -1

I’m getting a lot out of Prismatic, yet another in a venerable line of personalized “Daily Me” news services. I’m a sucker for these things but the technology landscape is literally strewn with their corporate wreckage. I would like to love it, but there are some issues:

  • Using short links in e-mail sent from the site
  • Putting links to Python documentation in my stream
  • No help, or way to get to my profile while reading my stream
  • Too much space allocated to “Share this story with your friends…”
  • Occasionally crashes Mobile Safari on my iPad

The biggest downside is that Prismatic is Yet Another Place To Read News (YAPTRN), and right now it’s last on my list of stops. We’ll see how long it can survive there.

The big upside is that I typically run across at least a few items of interest whenever I have a Prismatic session. It’s much better than predecessors of its ilk. The good bits:

  • Clever trick to heavily use facial images, social outlinks, and embedded Tweets. Feels more human and engaging.
  • Although completely opaque, the big images nicely break up all the text.
  • Inertial scrolling on the iPad fits the Prismatic style well, or they’ve really optimized for the device.

Makin’ Maps

This is part one of a five-part series about our recent explorations making choropleth maps using PostGIS, TileMill, Mapnik and Google Maps.

Turns out to be total of six posts, but it’s still a handy, not too deep, dive into making maps. Forewarned, there’s a fair bit of command line tweakage and assumed familiarity with open source tech. (Python Inside!) Not for those used to a lot of desktop app handholding.

From The Chicago Tribune’s News Apps Team.


Leanin’

If I indeed upgrade my personal Apple machine, in celebration of my fourth Macaversary, I’m leaning towards a kitted-out 13” MacBook Air. You can do a lot of damage with an ultralight portable boasting a half terabyte of SSD. And it’s at a relatively affordable price.


Enduring Geotech

I was all set to get some deep insight from The Atlantic’s article “The Future of the Map Isn’t a Map at All—It’s Information”, but it turned out to be pretty shallow. Even the attendant video wasn’t much more than a promotional for some new Google tech.

But between the provocative title, and my noodling into GIS technologies, it got me thinking that geohacking is a great business for a tech oriented person, especially with today’s advances. Interface applications, Web and native? Check! Massive data processing? Check! Real time data processing? Mobile applications? Check! Relevance to problems that matter? Check! Open, gentle slope, avenues to learn and hack? Check!

Important enduring organizations and institutions, with big checkbooks, care about understanding and using geospatial data. And more technologies are making more of that data available, at lower cost, more regularly.

If I was still in my past life, I would urge every Computer Science student to take an intro Geospatial Information System course. Right up there with Compilers, Operating Systems, and Relational Database Systems. GIS techniques and issues are that enduring of a computational capability in milspeak.


cliff

In the past I’ve written my own Python command line processing module to emulate what I call command shell frameworks ala git, Mercurial, and Subversion. Sucked.

I tried the pyCLI module but it didn’t quite work for me.

After a few hitches, Doug Hellmann’s cliff module did the trick. Need a longer test drive, but so far it’s been highly useful. I don’t quite love the use of distribute hooks but I can live with it until I find a better solution. The baked in command REPL is a nice to have.

Using cliff has been a good way to paper over some fairly complex processing with a power user grade UI. Also quite easy to add new features with quick turnaround.


Common Crawl Contest

The Common Crawl folks put together a little video to better explain their purpose, aims, and goals. Very well executed and it also includes an announcement of their first hacking contest. Tempted to do some sideline hacking on that dataset just for the Big Data experience. Winning would just be serendipity.


Postgres.app

Link parkin’: Heroku’s nicely packaged for Mac OS X, Postgres.app

Postgres.app is the easiest way to get started with PostgreSQL on the Mac. Open the app, and you have a PostgreSQL server ready and awaiting new connections. Close the app, and the server shuts down.

Postgres.app will be distributed through the Mac App Store, with a separate build containing the latest PostgreSQL beta available for direct download from the website.

I love the fact that PostGIS 2.0 is baked in. But I’m also a little wary of how well it supports the building and installation of extensions. I needz my plpythonu.


The Blatche Era

Andray Blatche’s stint as a Washington Wizard has ended. The Basketball Jones, alomg with some mildly amusing commenters, makes appropriate fun of the occasion:

As you can see, the Andray Blatche market is pretty dried up. There are even rumors that the Bucks’ original bid would have included an additional three pounds of mozzarella string cheese, but they pulled that because they didn’t think he was worth it. Tough break, but I’m sure the $23 million will cheer him up.

Looking back I’m less disappointed with Blatche then the last remaining piece of failure, Ernie Grunfeld. Andray was just a 29th pick, straight out of high school, project that hasn’t panned out. It was Grunfeld who made the decision to ridiculously overpay him.

And let’s be clear, I was pretty disappointed with Andray.


Udvar-Hazy Rises

The Dark Knight Rises opens this week, including a run at the Udvar-Hazy Center. Longtime followers will know that this makes me really happy.

Real IMAX FTW.


Common Crawl 2012

Mmmmmmm, fresh, hot data! With instructions to boot:

I am very happy to announce that Common Crawl has released 2012 crawl data as well as a number of significant enhancements to our example library and help pages.

Along with this release, we’ve published an Amazon Machine Image (AMI) to help both new and experienced users get up and running quickly. The AMI includes a copy of our Common Crawl User Library, our Common Crawl Example Library, and launch scripts to show users how to analyze the Common Crawl corpus using either a local Hadoop cluster or Amazon Elastic MapReduce.


Social Networks Unprofitable?

Good thought experiment by Derek Powazek: “What if Social Networks Just Aren’t Profitable?”

Here’s the short version: Every community-based site in the history of the web has essentially been a stab at creating a social network. Most of them fail as businesses, with the rare exception of small, lucky communities that become self-sufficient but not exactly prosperous. What if that’s just the way it is?

I’d say there’s a lot of evidence for the thesis and not much agin’ it. There’s also “profitable” and “PROFITABLE”. Maybe the profits in social networks just don’t scale to the level of publicly traded corporations. Doesn’t mean a nice living can’t be made at the Main Street or regional level.


Goodbye and Good Riddance

Friday, July 13th, glad to see you go. Seems like every single complex system (natural or man made) decided to spend the day frustrating me. From organizational BS, to proposal rejections, to IT failures (special honors to MS Outlook 2007), to traffic, to kids just doing what they do, frustration abounded.

Well your worst has been done and no catastrophes occurred. Sayanora!


s3tools

Link parkin’: s3tools

S3cmd is a command line tool for uploading, retrieving and managing data in Amazon S3. It is best suited for power users who don’t fear command line. It is also ideal for scripts, automated backups triggered from cron, etc.

Just what the doctor ordered for the continuous Tweet collectin’, S3 storin’, time constrained home hacker. On Ubuntu, cloning the git repo, running a config, and accessing my S3 bucket works right out the box.


eero

As someone who wrote a dissertation on programming language design, it’s rare when I see an announcement for a new language and go “Wow! That looks cool!” I just did that for the first time in a long time when I read about eero

Eero is a fully binary- and header-compatible dialect of Objective-C, implemented with a modified version of the Apple-sponsored LLVM/clang open-source compiler. It features a streamlined syntax, Python-like indentation, and other features that improve readability and code safety. It is inspired by languages such as Smalltalk, Python, and Ruby.

Eero” is pronounced [ˈe-rō]‚ and is similar to the English word “aero”.

After a surface read, I actually thought programming for the Mac OS X and/or iOS system APIs might be fun.

Still a long slog to success and popularity, but a kid’s gotta start somewhere.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.