home ¦ Archives ¦ Atom ¦ RSS

Posts

Link parkin’: Posts The Blogging App for iPad. If it’s decent at Markdown, which I doubt, it could be a contender to replace WordPress for iOS. $9.99 is a steep price though.


French Tweets

Between casual personal hacking and work responsibilities, I’m going on close to a year of serious Twitter data collection activities. So I’m always interested to see other folks’ case studies of their Twitter data hacking. Laurent Luce describes, in exquisite detail, a recent effort to track French politician mentions in the Twittersphere during their 2012 election.

This post describes how Pytolab was designed to process Tweets related to the 2012 French presidential election, in real-time. This post also goes over some of the statistics computed over a period of 9 months.

I must be a vet now since I saw their number of 8 million tweets and thought “is that all?” But even at that scale there’s something to be learned from their processing architecture although I was surprised that they ran into peak performance issues with an Amazon EC2 micro instance. A lot of the work I do is on nominally equivalent hardware and it seems up to the task. Maybe it’s the virtualization tax.


marisa-trie

Link parkin’: A trie (rhymes with pie) library for Python. Presumably, high performance sine it’s a thin wrapper around C++ code.

Static memory-efficient & fast Trie structures for Python (based on marisa-trie C++ library)

Home page


Nice Work Twitter

Glad I got a hold of my Tweetbot for Mac Alpha before it got pulled:

As some of you may have already noticed the download link for the Tweetbot for Mac alpha no longer works. Twitter’s latest API Changes means now we have a large but finite limit on the number of user tokens we can get for Tweetbot for Mac. We’ve been working with Twitter over the last few days to try to work around this limit for the duration of the beta but have been unable to come up with a solution that was acceptable to them. Because of this we’ve decided its best for us to pull the alpha.

I hope I’m using it enough for the Tapbots to get something useful out of my usage.


FiOS 802.11n FAQ

Link parkin’: Verizon Online FiOS FAQ » 3.0 Networking

The following information is for users who have seen the various bridging threads and/or the FAQs about using their own router and wondered which of these various configurations they should choose.

I was thinking of buying an Apple AirPort Express to extend my Wi-Fi range and performance. Unfortunately, I need to keep my somewhat outdated Verizon FiOS router in place to preserve certain set-top box features working. The above link comprehensively lists out all the potential options, pros, and cons.


AtBat12 Verdict

AtBat12 Logo

Previously, I had put Major League Baseball’s AtBat 2012 on a probationary test drive for one month. The result?

AtBat 2012 is so useful to me, that I waited until my subscription expired so I could make a full $9.99 in-app full season purchase. I probably could have made it with another $2.99 monthly renewal or two. However, I wanted to fork over a few extra shekels as admiration for something that truly delights me.

The streaming is top-notch and I easily put on one to two games a night. Typically ‘Nats and then a West Coast game. On the iPad the scoreboard is beautiful and comprehensive. Full details of any of the day’s games are within a tap or two. The real-time Gameday displays are neat if a bit of out of synch with the streamed audio.

And I haven’t even taken advantage of any of the video features. Bravo MLB!


Pocket Prismatic

Prismatic Logo So despite the chops of the Prismatic crew, the web version was sort of languishing on my iPad. Prismatic was the info feed of last resort behind my RSS feeds in Mr. Reader, Tweetbot, Hacker News in Mobile Safari, and even the Techcrunch iPad app. Not a good position to be in.

Enter the Prismatic iPhone App. On my iPhone, there’s a little less competition, with Tweetbot out front, and NetNewsWire trailing somewhat in second. I can see iPhone Prismatic filling a nice role of serendipitous discovery beyond my RSS, but not as micro-nugget as Twitter.

The app is very elegantly designed, and has a great unboxing, which is a new (at least to me) trend in iOS apps. The only nit I have out of the gate is that it forces alignment of stories on discrete boundaries, which gives inertial scrolling a slightly clunky feel. And the refresh when you hit the bottom of available stories has no indicator, so a fetching pause seems like the app is stuck. In total, not as smooth as my favorite inertial scroller Tweetbot.

But if they’re content curation is as good as they claim it should be, I think I can wait for a bit more polish.


Coming To You From Blogsy

This post was created with Blogsy. If nothing else the app unboxing user experience is a notch above WordPress for iOS. However, we’re not looking very Markdown friendly.

Addendum (written in WordPress for iOS, natch): Generating Markdown with Writing Kit, and then transferring to another app, might be more workable.


iOS WP Disappointment

In case you haven’t noticed, I’ve been using the iPad much more to post to this here blog. The lack of images is a pretty good tell. I was hoping the WordPress for iOS application would improve in its most recent release. The actual text editing is seriously irritating, requiring too many modal changes. And the new version has borked my post preview.

Time to start looking for alternatives. An older review of Blogsy is encouraging. Looks like they’re a bit more image friendly.


Lorica’s 7 For Spark

@bigdata, a.k.a. Ben Lorica, might be my best Twitter follow. He’s been attending Cal Berkeley’s (Go Bears) AMP Camp and is now sold on Spark, an in-memory big data processing framework:

A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big data toolkit. Here’s why:

Read on for more details, but that’s a solid upvote for Spark in my book.


MCFC Analytics

Link parkin’: Manchester City Football Club Analytics. An open invitation to hack with football data from every game of the 2011-2012 Barclay’s Premier League season, not just Man City, or at least that’s how I read the announcement.

You have a unique opportunity to be part of this community. Register your interest in this project and we will send you our OPTA data set for every ‘on the ball’ event for every Premier League player in every match in the entire 2011-12 Premier League season FOR FREE.

This data is designed to be broken down, analysed, graphed and visualised however you see fit. We have made it available to encourage and inspire the next level of analytics. Our hope is that the data is used to create new performance measures, tools for player/team comparison and profiling, season-long analysis for benchmarking players’ performance and contextualising these performances based on playing position and opposition.

The Noisy Neighbors just made my 2012-2013 Premier League rooting decision for me. I’m all on City’s bandwagon for this campaign. Hope they can pull off the back-to-back. And do a little better in the Champions League.

At the very least, this squad has a bit of … “personality”.


Google 2-Factor Auth and App Specific Passwords

Public service announcement. If you turn on Google’s two-factor authentication, you’ll probably bust a bunch of your non-Web apps. John Martellaro helpfully explains how to deal with RSS readers.

After you turn on Google’s 2-Step authentication for your account, your RSS readers on your iPhone and iPad will fail to connect. Here’s how to quickly fix that.

Generalizing to other apps is left as an exercise to the weblog reader.


EPD and virtualenv

Parking this nugget in case it can help someone else discover it with Google.

EPD, the Enthought Python Distribution, is a nice packaging of a lot of scientific and high performance computing modules. In particular, it takes away the hassle of building from source on persnickety platforms like Mac OS X.

virtualenv is an indispensable Python tool for creating isolated, customizable, and easily disposable Python environments. You can initialize these environments from a specific Python installation allowing experimentation with multiple Python versions in parallel, e.g. EPD and the stock install on your machine.

EPD and virtualenv weren’t working well together for me on my MacBook. Here’s what worked for me:

  • Make sure you have a recent version of virtualenv, 1.7.2 seems good.
  • Use the --system-site-packages argument to make sure you pick up the EPD extensions. You can also hack up postactivate and postdeactivate to fix up PYTHONPATH.
  • Here’s an example of creating a new virtual environment:

mkvirtualenv -p /Library/Frameworks/EPD64.framework/Versions/Current/bin/python --system-site-packages epd64

So far so good. You’ll need the virtualenvwrapper package to use mkvirtualenv.


JPL Infographics

The U.S. Department of State’s Bureau of International Information Programs site describes the new JPL Infographics site:

JPL Infographics, a newly launched website and resource database from NASA’s Jet Propulsion Laboratory in Pasadena, California, is inviting space aficionados and graphic wizards to take on a visual challenge by grabbing NASA data and transforming them into a scientific work of art.

The website provides extensive collections of NASA science and mission data, graphics, and space images that members of the public can download and use to create their own infographics — creative illustrations of complex data. Users can upload their infographics, have them reviewed by JPL experts, and share their creations in a public gallery on the JPL website.

Free data, a task, and a place for feedback. Nice work JPL.


Social Media Event Diffusion

Link parkin’: “Event Diffusion Patterns in Social Media”, by Kim, Xie, and Christen.

In this paper, we propose a method for identifying real-world events on social media, and present observations about event diffusion patterns across diverse media types such as news, blogs, and social networking sites. We first construct an event registry based on the Wikipedia portal of global news events, and we represent each real-world event with entities that embody the 5W1H (e.g., organization, person name, place) used in news coverage. We then label each web document with the list of identified events based on entity similarity between them.

Mostly interesting in how it uses Wikipedia events pages, and basic entity extraction, as a reference registry for events.


Olympics, Same and Different

I’ve grown into the same sentiment as Chris Bellavita:

I’ve had the fortune to participate in or observe security operations for six Olympiads. Based mostly on that experience, I’m not a fan of the Olympic Games.

The Olympics are a circus that comes to town for a little while, selling promises and dreams. It helps separate people from their money, then packs up the tents and moves on to the next stop.

Now he has quite a different personal perspective, being directly involved in security for many Olympiads. But just as someone interested in great competition the games for me have devolved into an addendum to spectacle, celebrity, and corporatism.

This year, NBC’s misguided, but highly successful, broadcast strategy caused me nary a concern, victim of my newfound television withdrawal. Never felt better. There was a minor reflexive twitch when I heard about the US Women’s Soccer National Team go into overtime against Canada, but I got over it.


Onward Adrian

In my past life, I had a somewhat unnatural, and prescient if I do say so myself, interest in the intersection of technology and journalism. I would even hang out with ink stained wretches on a routine basis and help train up a few on that ’thar new bloggin’ thang.

I had enough pull to get invitations extended to various people doing cool things at the Lawrence Journal-World in Manhattan, Kansas. This included one Adrian Holovaty, principal instigator of the Django web development platform, née content management system. Of course when he came to Northwestern I enjoyed having a needy nerdy chat. Curse you autocorrect!

We never got synch up on any interesting projects, but he went on to do some nice work with The Washington Post and launch the highly impactful EveryBlock. Now Adrian’s moving on to pastures TBD. EveryBlock and Django, not bad for a developer’s career and I’m sure he’s got plenty left in the tank.

Good luck buddy!


Blogaversary, Macaversary 4.0

Not much to say here. Trucking along at full speed into year five of this blog adventure. I have grown comfortable without a grand plan and am enjoying just keeping the ride going.

Happiness is still a warm MacBook, maybe not so new. Think I’ll hold off on a laptop upgrade until something comes along a little closer to the sweet spot. Plus I’ve got some other clear cut need purchases this year, and I definitely want an iPhone 5. Besides, the new toy has been plenty of upgrade for one year.


Farewell Pushcart Queen

Breaking my embargo on eulogies, I’m shamelessly ripping off my title from the MetaFilter post that put me on notice of Jean Merrill’s passing:

The Pushcart War was one of those childhood books which changed my life forever, teaching me the importance of activism and civic participation. Merrill’s stories also show how much fun organising can be. We have lost a great and creative soul, someone capable of introducing young people to issues that matter with passion and a twinkle in her eye.

Along with The Phantom Tollbooth, The Pushcart War marked my transition to “serious” reader. Yeah, I proceeded to still read a bunch of junk. But I forever knew there would be stuff out there with more heft and style. There did exist a higher standard to seek out and aspire to. I’m still managing to work towards that standard, bit by bit.

Godspeed, kind lady.


The Twitter Comedy Factory

Here’s a study I’d be interested in seeing just on general interest. The usage of Twitter as a comedy platform. Seems like there’s a @FakeX account for any value of X and half a million “@!*$ My Whatever Says”. Is this a new comedic venue? How big is it in terms of creators, participants, and dollars? And put it all in a comedy historic context for me.

Bonus points if you’re funny.

Sigh. So many ideas. So little time.


SportsNerd, A Sports Tech Vertical

Bleacher Report LogoThis is just off the cuff daydreaming with about zero background research. It’s inspired by the recent purchase of Bleacher Report for just a little south of $200 million and The Art of Manliness’ two part guide on …Starting a Side Hustle.

Assertion: there is no part of the modern worldwide sports industry that is not being heavily technology influence. Athletic performance? Check! Financials (including gambling)? Check! Media? Check! Analysis? Check!

Seems like a Web publication that was hyper-focused on the varied intersections of sports and tech might have a chance. Read as Gizmodo or Engadget but always coming with a sports angle. Broad situational awareness of tech impact on sports, augmented with deep dives, interviews, and even opinion pieces. You could throw in a little humor and still not fall into the T and A tarpit.

The neat trick is that you could probably get something rolling on the real cheap as a simple linkblog. For bonus points algorithmically trawl the intarwebs to decrease/scale human involvement. The side hustle is to grow it into something on the order of Daring Fireball or kottke.org. Individual voice, somewhat broad taste, but with a clear content focus. If there’s any traction, grab a buddy or two, or find some enthusiastic hobbyists to build out in the direction of a serious web publishing business. Of course the really hard part is marketing to a) build out the audience, and b) generate revenue probably through advertising, but maybe some other creative financing.

Definitely not an unheard of story, but it’s hard to believe that someone hasn’t already tried this and failed miserably, or already been swallowed up by one of the big boys.

Anyway, yet another Side Project ™ to take on in my Copious Spare Time ™.


GPS: A Graph Processing System

Link parkin’: GPS: A Graph Processing System

GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon’s EC2.

We have completed an initial version of GPS, which is available to download. We have run GPS on up to 100 Amazon EC2 large instances and on graphs of up to 250 million vertices and 10 billion edges.

Okay, it’s from Stanford, which raises suspicion (Go Bears!), but 10 billion edges is no joke.


iOS XScreenSaver

Link parkin’: XScreenSaver for iOS. From the Wikipedia entry for the UNIX classic

XScreenSaver is a collection of about two hundred free screensavers for Unix, Mac OS X and iOS. It was created by Jamie Zawinski in 1992 and is still maintained by him.

Amazing that such an old piece of kit can migrate to such modern platforms.


WordPress iOS 3.1

There’s a new release of the WordPress App for iOS.

This is a big one! The WordPress for iOS dev team is excited to announce that version 3.1 is now available in the App Store. In this update you’ll find a redesign for both iPhone and iPad, along with loads of new features!

Thank goodness. The current editing UI on the iPad is driving me up a wall. Too many switches out of editing mode to do simple things. Hope the new version is a bit more efficient.


Titan Graph Performance

I’ve mentioned the Titan graph store before. Aurelius, the company building Titan, has posted a performance study of the DB that heavily relied on Amazon Web Services:

The presentation to follow discusses the simulation’s social graph structure, the types of processes executed on that structure, and the various runtime analyses of those processes under normal and peak load. The presentation concludes with a discussion of the Amazon EC2 cluster architecture used and the associated costs of running that architecture in a production environment. In short summary, Titan performs well under substantial load with a relatively inexpensive cluster and as such, is capable of backing online services requiring real-time Big Graph Data.

The kicker is that their analysis of one year’s AWS costs to support a half-million users is well within an early startup’s wheelhouse. Also, their modeling and sim approach is pretty interesting. They’ve probably simplified the discussion approach for presentation, but there’s a bunch of approaches in the CS literature that are also applicable.


Getting Jacked

Mat Honan got jacked. Kiddies wrecked his online presence for the lulz, exposing some severe security deficiencies in Amazon and Apple security procedures. It’s a bit of a cause célèbre, invoking a lot of suggestions on how to safeguard one’s self.

Like Tim Bray, I heart 1Password but need to do more, especially since I’m a heavy Amazon user. Following Matt Cutts’ advice, I’ll be turning on 2 Factor Authentication. Wired also has a number of solid suggestions, along with lots of other info kicking across the Web. Do yourself a favor and partake of the wisdom.

And I always thought that remote wipe capability was a bit of a hinky idea.


The CS CEO

I’m really surprised to hear this, almost to the point of fisking it out, but I trust Greg Linden. Marissa Mayer is the first person with an advanced CS degree hired into a Fortune 500 CEO position.

Marissa Mayer as CEO of Yahoo may be a test of a new style of executive leadership, the optimizing CEO.

She is not the first computer scientist to lead a major company, but she is the first computer scientist (MSCS or higher) hired in as CEO to a Fortune 500 company. Many computer scientists view everything as an optimization problem. People, work, politics, life, everything is a search (often of a dynamic space) to find a maximum near the global maximum.

On the other hand, I’m not that surprised in that there are only so many of those spots, they haven’t been open to everyone for much of history, and they typically skew towards MBAs. The qualifier, “hired in”, also makes a big difference. There have been a number of advanced CS degree CEOS that have grown companies into Fortune 500 members. This may be the first one that has to inject (reinject?) a CS culture into such a large corporation.

And I agree with Greg that a systems optimization perspective in a Fortune 500 CEO will be an interesting experiment.

Lastly, never really thought of Yahoo! as a Fortune 500 company.


Trident, Realtime Big Data

Nathan Marz is still a badass, despite being swallowed up by Twitter:

Trident is a new high-level abstraction for doing realtime computing on top of Twitter Storm, available in Storm 0.8.0 (released today). It allows you to seamlessly mix high throughput (millions of messages per second), stateful stream processing with low latency distributed querying. If you’re familiar with high level batch processing tools like Pig or Cascading, the concepts of Trident will be very familiar - Trident has joins, aggregations, grouping, functions, and filters. In addition to these, Trident adds primitives for doing stateful, incremental processing on top of any database or persistence store. Trident has consistent, exactly-once semantics, so it is easy to reason about Trident topologies.

We’re really excited about Trident and believe it is a major step forward in Big Data processing. It builds upon Storm’s foundation to make realtime computation as easy as batch computation.

I’m sure Trident has some sharp pointy edges, but it also looks like a bag of fun.


EPL ’12-13 In View

Premier League Logo Holy smokes! The new English Premier League campaign kicks off, two weeks from yesterday. It feels like Manchester City just got done pulling victory from the jaws of defeat.

I’ll have to check the summer transfers and evaluate who will win my rooting interest for the season. Betting folks will note that wagering against my allegiance may be a good idea. Arsenal and Chelsea have already rued my attention. Can’t really fancy rooting for any Mancunians, other than the Big Four and the Spurs, who might realistically challenge for the title?


Why HDFS

Hadoop Logo Charles Zedlewski makes an interesting analogy between HDFS, the Hadoop Distributed File System, and the Linux operating system. Stealing the punchline:

It’s rare when you get to see history repeat itself so completely as it is with HDFS. Today HDFS may not be the best filesystem for content addressable storage or nearline archive. But then 15 years ago who would have thought Linux would find its way into laptops, routers, mobile phones and airport kiosks?

Linux drew us the map. The smart money is already following it.

We’ll see how it plays out, but given the entrenched nature of HDFS he might be right. HDFS’ open source nature, and maybe more importantly community, means just about any good distributed file system idea can be quickly embraced and extended.

There is probably one area where HDFS could be radically updated or face displacement. Real-time streaming datasets don’t fit the HDFS model particularly well. Doesn’t mean someone smart can’t come along and fix it up.

Zedlewski also heavily invokes the nice support for Map/Reduce processing that HDFS provides. Map/Reduce is clearly successful, but these other processing demands may eventually lead to other programming models that fit less well with HDFS.

But I’m of a mind that Zedlewski is mostly right, and that HDFS is a nice solid foundation to build on going forward.


Building REST APIs with Python

In my Copious Spare Time ™, I’m thinking about how one can rapidly build, lightweight, fast REST APIs using Python. I may revisit Django approaches again, but the heavy dependency on an RDBMS and a model-based Object-Relational Manager (ORM) makes it feel like I’m fighting against the toolkit. The key issue is that to get performance for some aspects, I’m anticipating the need for other data stores such as a full-text index or a NoSQL repository. Thinking one of the Python web micro-frameworks, possibly Flask, might be the right fit.

Wow! It’s been a long time since I talked about Redis. It’s up to version 2.4.x now.


Mac OS X Tweetbot

Tweetbot Logo Tapbots released an alpha version of Tweetbot for Mac OS X. Figured I’d give it a whirl since I use Tweetbot so much on the iPhone and iPad. Like the familiar look and haven’t run across any bugs so far. Here’s hoping the Tapbots have success with this product.


Django Project Start

Okay, I learned a lot from this post, Starting a Django Project the Right Way:

One of the things I wish I had known when starting my Django project for IllestRhyme was “How do I start a real Django project”. As in, one that’s actually going to be used and developed more, not the toy project from the (admittedly execellent) Django documentation.

Having just gone through this process again for my new site, I wanted to share the knowledge I’ve gained about how to properly start a project in Django.

In addition, the comments are well worth reading.

Via Nat Torkington


Roles of a Data Science Team

A Gnip Data Story, a.k.a interview, with bitly’s Hilary Mason provides the most substantive detail I’ve heard about what her team actually does:

My team plays a few roles within the company. We handle the business analytics, which can be answering very simple questions like, “How many new URLs did we see yesterday?” to complex questions like, “How do we value a URL being clicked from platform X vs platform Y over time?”.

… In summary, my team is responsible for pushing the boundaries of where bitly can go. It’s fun.

There’s plenty more where that came from.

Good stuff. Just always been curious.


Don’t Tempt Me

It’s a sad statement of mega-corporation life that while the tumblr for Today’s Corporate Meeting Challenge is really funny, I am actually tempted to try and rise to the challenge.

Even sadder is that I’d be lucky to actually be called out in any of the numerous meetings I’m asked, nay required, to attend. Half of these would just go over people’s heads and they’d just forge on.

There should be a separate category for teleconferences. I’m undecided if these monstrosities should garner bonus points for laugh multiplication, or be considered the putting green. Seriously, you can always say the most profoundly stupid things in a telecon if you follow it up with “Sorry. Thought I was on mute.”


Python Modules of Unusual Usage

Getting a lot of mileage out of:

  • requests. Wasn’t quite ready to believe the hype, but it does make executing HTTP requests a lot more humane
  • ConfigObj. Makes parsing Windows INI files as about as convenient as possible
  • argparse. Once you internalize the Tao of this module, combine with cliff, to make command-line Swiss army chainsaws.

Computational Politics

Choice quote from Nat Torkington:

If von Clauswitz were around today, he’d say drones are the computation of politics by other means.


+1 AtBat12

MLB’s AtBat 2012 is an iPad and iPhone app. I’ve been enjoying it on my iPad but that device is currently charging. So I fired up the iPhone version, hit the “Restore” button, logged into iTunes, and boom! Streaming ’Nats audio.

Nice work MLB!


Peet! Say It Ain’t So

As someone who’s first good cup of copy was brewed from Peet’s beans I’m sad to see the Berkeley based roaster absorbed into a massive corporate maw. They held out much longer than I thought. Fond memories of the eclectic customer stream going through the tiny ur-Peet’s at Shattuck and Rose.


Scary Formats

You know you’ve been hacking when you start to both hate and appreciate the intricacies of csv file field escaping because you’ve come to both hate and appreciate the intricacies of JSON’s UTF-8 encoding. Kids, don’t try to put a data format that’s hard to escape by itself inside another format that has, at best, loosely defined escaping mechanisms.

Two great tastes that taste horrible together.

But I did it all for the data, the data.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.