home ¦ Archives ¦ Atom ¦ RSS

Slow Copyright Progress

An interesting, contrarian perspective from Lucas Gonze:

News stories about copyright often pop up in the technology press. The subtext is usually like: “here’s how deeply broken copyright is.”

Don’t believe the subtext. Copyright is not broken in any sense that matters.

Copyright is in the process of adapting to changes in information technology. The process of adapting will take a longish time. The process started in the 1980s, which led to passage of landmark legislation such as the DMCA in the 1990s. In the years 2000-2010 the DMCA safe harbors were tested and defined in detail, and internet sites went from flat defiance of the law to grudging compliance. That’s 25 years of progress. I imagine 25 more will complete the job.

The notion is also applicable to other domains that typically frustrate technologists. I’m looking at you healthcare, even though you can make a good argument that health payment is broken in many senses that matter.


Green-Marl Graph DSL

Let’s try out this here newfangled tweet embedding in WordPress:

Just the link on a line by itself isn’t quite working for me. However, The embed shortcode seems to do the trick.

This is fabulous!! Definitely handy for spicing up this space visually.

And a quote for old time’s sake:

The increasing importance of graph-data based applications is fueling the need for highly efficient and parallel implementations of graph analysis software. In this paper we describe Green-Marl, a domain-specific language (DSL) whose high level language constructs allow developers to describe their graph analysis algorithms intuitively, but expose the data-level parallelism inherent in the algorithms.

Released to the public from The New Toy (TM) via Verizon LTE. Yeah me!


Fast Python Data Structures

Fast as in CPU performance, not as in …, well you know:

Python provides great built-in types like dict, list, tuple and set; there are also array, collections, heapq modules in the standard library; this article is an overview of external lesser known packages with fast C/C++ based data structures usable from Python.


Kids Don’t Do This

A lot of my data hacking involves mugging over multiples of biggish files stashed in a standard UNIX filesystem. On any UNIXen worth its salt, the find and xargs utilities are your friends here. Use ’em.

But don’t make the same mistake that’s been biting me in the rear recently, embedding invocations of these tools within my programs. So tempting, especially in languages like Python that make shell invocation so easy, even though Python has the nice os.walk module. Works great until you move to another system and one or the other of find or xargs doesn’t work the way you assumed. Now you have to extract the calls and use the language’s built in filesystem walking capabilities or provide a command line option to specify the specific tool binaries if you build local custom versions.

Instead devise your program to read filenames from its command line args. Bonus points if you have an option to read from stdin. Then wrap find and xargs around your program. And if in a specific situation your args to either utility get too hairy, stuff the invocation into a shell script.

You’ll thank me in the long run.


Hosted PostGIS Pricing

The one issue with going bigger and better with my Tweet collection is the actual storage, especially if I want to continuously run a spatial DB, like PostGIS. I was just poking around on the Intarwebs and the pricing seems prohibitive for a beginner spatial hacker with significant data.

CartoDB can get you started for $29 a month, but you only have a piddly 50Mb of storage.

According to Heroku, PostGIS is available as an extension to their hosted Postgres product. This includes a whopping 1 Tb of storage. However, the smallest package is $200 a month!

Man, I didn’t know hosting PostGIS was so difficult.


Fear of an OAuth Planet

Twitter Bird Small For the final third of the year, I’m resurrecting my Twitter data collection project on a grander scale. More cities, more data, more processing, more analysis. The only major delta is that Twitter is seriously threatening to apply OAuth to all of its API endpoints. Since my little project isn’t really a Web application per se, the thought of having to do a 3-Legged OAuth handshake seemed daunting especially for the little ’ole Streaming API.

Fortunately, Peter Hoffman teed up an easy authentication workaround by just generating the tokens for the consumer and app through Twitter. Then you simply stash them away and appropriately sign your initial connection request.

Meanwhile, Greg Roodt at The Mosh Pit, goes the full Monty, explaining how even with the complete 3-Legged handshake, authenticating streaming with OAuth isn’t really that bad. Bonus props for pointing to a GitHub repository of sample code that can be conveniently forked and extended.


Wayne Enterprises Chronicles: Prologue

You either die a hero or you live long enough to see yourself become the villain. — Harvey Dent

The Dark Knight Logo Mini I know no one cares about my fantasy team, but forewarned is forearmed. This year’s fantasy football office league edition has been named Wayne Enterprises in honor of the conclusion of The Dark Knight trilogy.

Besides, it makes for a catchy logo.

After the break, my draft and some thoughts on the team’s potential.

8 team league. Snake draft with random position assignments. Overall draft position in parens. I lucked into the number 1 pick for the 2nd time in three (four?) years. And of course, this is a year I do no preparation. The last time I drew first overall though ended in tears, with CJ2K not doing very well, and dragging my team down with him.

  1. (1) Arian Foster RB
  2. (16) Cam Newton QB
  3. (17) Darren McFadden RB
  4. (32) Marques Colston WR
  5. (33) Hakeem Nicks WR
  6. (48) Miles Austin WR
  7. (49) Jason Witten TE
  8. (64) Steven Jackson RB
  9. (65) Tony Romo QB
  10. (80) Baltimore DEF
  11. (81) Dwayne Bowe WR
  12. (96) Robbie Gould K

I might have gone a little early with Cam Newton, but given I could get two top RBs in my first three picks, I thought he was worth the reach. My receivers are a boom or bust set. I expect two of the four to go down with injuries by week three. I had to dump Jason Witten already for Tony Gonzalez.

Relative to other teams in the league, I feel like I’m in pretty good shape. There’s one other, maybe two, teams that had similar quality drafts. Of course things will evolve rapidly as real games are played, but I’m looking forward to my chances this year.


New Media Hacking

A lie gets halfway around the world before the truth has a chance to get its pants on. — Winston Churchill

Jamais Cascio has been doing some informed thinking about the intersection of Twitter, botnets, misinformation, and systems hacking.

Two of my rules for constructing useful and interesting scenarios are to (a) think about what happens when seemingly disparate changes smash together, and (b) imagine how new developments might be misused. In both cases, the goal is to uncover something unexpected, but (upon reflection) disturbingly plausible. I’d like to lay out for you the chain of connections that lead me to believe that we’re on the verge of something big. …

A black hat hacker could, with ease, create a network of Twitter bots set to retweet each other on command, send @messages to important information hubs (a few of which would retweet stories further), and drive up the visibility of certain hashtags and keywords. Done with the right target and message, and at the right time, such a network could potentially trigger sudden swings in value of targeted shares. The drop in value need not last for long; trading systems that know the stories to be false could swiftly snap up the briefly-undervalued stock. Conversely, the attack could be done in a way to cripple a particular company or stock market, or even to distract journalists from another story.

Similarly, a Twitter bot network, retweeting/spreading misinformation, could potentially cause a media firestorm if the target was a politician. Even if the misinformation was corrected within the hour, the spread would be impossible to fully contain. Could something like this even swing an election?

I can assure you, as a heavily invested Twitter researcher, what Cascio describes is eminently plausible.


So Much For Posts

Thanks to Cult of Mac it looks like Posts isn’t what I’m really looking for in an iPad blogging app.

Also missing is Markdown support, John Gruber’s human-friendly markup language. This would certainly be a handy third pane editor in addition to HTML and Rich Text, but again any writers making heavy use of Markdown will likely be using another app to do it. It would, though, let writers add things like bulleted lists and similar.

Blogsy is in a similar boat, although I made the mistake of rushing out and buying it without due diligence. Th current popular operating mode seems to be the use of a nice text/Markdown editor like Writing Kit, generate the HTML in that app, then ship to an HTML oriented app like Posts.

Feels like a kludge. On the other hand, I’m pretty intrigued by Writing Kit.


Linden’s Links Redux

Greg Linden is back, with another pile of interesting links. A sampling

Nice example of how better hardware in your database can be faster and cheaper than expanding your caching layer (link)

Google Research and their hybrid research model blends research and engineering (to maximize impact and avoid the problematic tech transfer from research) and keeps projects short (but still do long-term research by iterating). (link)

Why read research papers? “These papers often foreshadow where the rest of the world is going.” (link)


September Already?!

Wow! It’s already September. Roughly two thirds of 2012 has gone by. Where did it all go?

It’s been a topsy turvy year for me so far, mostly good with some noticeable dips. But with four months left, still seems like plenty of time to make 2012 top notch all the way around.


NFLgame

Uh oh! Andrew Gallant has put together two of my favorite tastes: fantasy football (really sports stats) and Python

As a programmer and a fantasy football addict, I am embarassed by the meansthrough which we must expend ourselves to get data in a machine readable form. This lack of open source software cripples the community withsub-standard tools, and most importantly, detracts from some really cool andfun things that could be done with easily available statistics. Many toolsare either out-dated or broken, or if they work, they are closed source andoften cost money.

Yesterday I started work on a new library package that I hope will start toimprove this sorry state of affairs.

NFLGame is a Python package that provides convenient access to NFLstatistics. This includes games that are currently being played, or games as far back as the 2009 season.

Looks like a definite install. The question is how much of my time it eventually sucks up. Thanks Andrew!

And I know nobody cares about my fantasy football team, but yup, I will be competing in my office league yet again and of course blogging about the season. Super! Thanks for asking!


Posts

Link parkin’: Posts The Blogging App for iPad. If it’s decent at Markdown, which I doubt, it could be a contender to replace WordPress for iOS. $9.99 is a steep price though.


French Tweets

Between casual personal hacking and work responsibilities, I’m going on close to a year of serious Twitter data collection activities. So I’m always interested to see other folks’ case studies of their Twitter data hacking. Laurent Luce describes, in exquisite detail, a recent effort to track French politician mentions in the Twittersphere during their 2012 election.

This post describes how Pytolab was designed to process Tweets related to the 2012 French presidential election, in real-time. This post also goes over some of the statistics computed over a period of 9 months.

I must be a vet now since I saw their number of 8 million tweets and thought “is that all?” But even at that scale there’s something to be learned from their processing architecture although I was surprised that they ran into peak performance issues with an Amazon EC2 micro instance. A lot of the work I do is on nominally equivalent hardware and it seems up to the task. Maybe it’s the virtualization tax.


marisa-trie

Link parkin’: A trie (rhymes with pie) library for Python. Presumably, high performance sine it’s a thin wrapper around C++ code.

Static memory-efficient & fast Trie structures for Python (based on marisa-trie C++ library)

Home page


Nice Work Twitter

Glad I got a hold of my Tweetbot for Mac Alpha before it got pulled:

As some of you may have already noticed the download link for the Tweetbot for Mac alpha no longer works. Twitter’s latest API Changes means now we have a large but finite limit on the number of user tokens we can get for Tweetbot for Mac. We’ve been working with Twitter over the last few days to try to work around this limit for the duration of the beta but have been unable to come up with a solution that was acceptable to them. Because of this we’ve decided its best for us to pull the alpha.

I hope I’m using it enough for the Tapbots to get something useful out of my usage.


FiOS 802.11n FAQ

Link parkin’: Verizon Online FiOS FAQ » 3.0 Networking

The following information is for users who have seen the various bridging threads and/or the FAQs about using their own router and wondered which of these various configurations they should choose.

I was thinking of buying an Apple AirPort Express to extend my Wi-Fi range and performance. Unfortunately, I need to keep my somewhat outdated Verizon FiOS router in place to preserve certain set-top box features working. The above link comprehensively lists out all the potential options, pros, and cons.


AtBat12 Verdict

AtBat12 Logo

Previously, I had put Major League Baseball’s AtBat 2012 on a probationary test drive for one month. The result?

AtBat 2012 is so useful to me, that I waited until my subscription expired so I could make a full $9.99 in-app full season purchase. I probably could have made it with another $2.99 monthly renewal or two. However, I wanted to fork over a few extra shekels as admiration for something that truly delights me.

The streaming is top-notch and I easily put on one to two games a night. Typically ‘Nats and then a West Coast game. On the iPad the scoreboard is beautiful and comprehensive. Full details of any of the day’s games are within a tap or two. The real-time Gameday displays are neat if a bit of out of synch with the streamed audio.

And I haven’t even taken advantage of any of the video features. Bravo MLB!


Pocket Prismatic

Prismatic Logo So despite the chops of the Prismatic crew, the web version was sort of languishing on my iPad. Prismatic was the info feed of last resort behind my RSS feeds in Mr. Reader, Tweetbot, Hacker News in Mobile Safari, and even the Techcrunch iPad app. Not a good position to be in.

Enter the Prismatic iPhone App. On my iPhone, there’s a little less competition, with Tweetbot out front, and NetNewsWire trailing somewhat in second. I can see iPhone Prismatic filling a nice role of serendipitous discovery beyond my RSS, but not as micro-nugget as Twitter.

The app is very elegantly designed, and has a great unboxing, which is a new (at least to me) trend in iOS apps. The only nit I have out of the gate is that it forces alignment of stories on discrete boundaries, which gives inertial scrolling a slightly clunky feel. And the refresh when you hit the bottom of available stories has no indicator, so a fetching pause seems like the app is stuck. In total, not as smooth as my favorite inertial scroller Tweetbot.

But if they’re content curation is as good as they claim it should be, I think I can wait for a bit more polish.


Coming To You From Blogsy

This post was created with Blogsy. If nothing else the app unboxing user experience is a notch above WordPress for iOS. However, we’re not looking very Markdown friendly.

Addendum (written in WordPress for iOS, natch): Generating Markdown with Writing Kit, and then transferring to another app, might be more workable.


iOS WP Disappointment

In case you haven’t noticed, I’ve been using the iPad much more to post to this here blog. The lack of images is a pretty good tell. I was hoping the WordPress for iOS application would improve in its most recent release. The actual text editing is seriously irritating, requiring too many modal changes. And the new version has borked my post preview.

Time to start looking for alternatives. An older review of Blogsy is encouraging. Looks like they’re a bit more image friendly.


Lorica’s 7 For Spark

@bigdata, a.k.a. Ben Lorica, might be my best Twitter follow. He’s been attending Cal Berkeley’s (Go Bears) AMP Camp and is now sold on Spark, an in-memory big data processing framework:

A large portion of this week’s Amp Camp at UC Berkeley, is devoted to an introduction to Spark – an open source, in-memory, cluster computing framework. After playing with Spark over the last month, I’ve come to consider it a key part of my big data toolkit. Here’s why:

Read on for more details, but that’s a solid upvote for Spark in my book.


MCFC Analytics

Link parkin’: Manchester City Football Club Analytics. An open invitation to hack with football data from every game of the 2011-2012 Barclay’s Premier League season, not just Man City, or at least that’s how I read the announcement.

You have a unique opportunity to be part of this community. Register your interest in this project and we will send you our OPTA data set for every ‘on the ball’ event for every Premier League player in every match in the entire 2011-12 Premier League season FOR FREE.

This data is designed to be broken down, analysed, graphed and visualised however you see fit. We have made it available to encourage and inspire the next level of analytics. Our hope is that the data is used to create new performance measures, tools for player/team comparison and profiling, season-long analysis for benchmarking players’ performance and contextualising these performances based on playing position and opposition.

The Noisy Neighbors just made my 2012-2013 Premier League rooting decision for me. I’m all on City’s bandwagon for this campaign. Hope they can pull off the back-to-back. And do a little better in the Champions League.

At the very least, this squad has a bit of … “personality”.


Google 2-Factor Auth and App Specific Passwords

Public service announcement. If you turn on Google’s two-factor authentication, you’ll probably bust a bunch of your non-Web apps. John Martellaro helpfully explains how to deal with RSS readers.

After you turn on Google’s 2-Step authentication for your account, your RSS readers on your iPhone and iPad will fail to connect. Here’s how to quickly fix that.

Generalizing to other apps is left as an exercise to the weblog reader.


EPD and virtualenv

Parking this nugget in case it can help someone else discover it with Google.

EPD, the Enthought Python Distribution, is a nice packaging of a lot of scientific and high performance computing modules. In particular, it takes away the hassle of building from source on persnickety platforms like Mac OS X.

virtualenv is an indispensable Python tool for creating isolated, customizable, and easily disposable Python environments. You can initialize these environments from a specific Python installation allowing experimentation with multiple Python versions in parallel, e.g. EPD and the stock install on your machine.

EPD and virtualenv weren’t working well together for me on my MacBook. Here’s what worked for me:

  • Make sure you have a recent version of virtualenv, 1.7.2 seems good.
  • Use the --system-site-packages argument to make sure you pick up the EPD extensions. You can also hack up postactivate and postdeactivate to fix up PYTHONPATH.
  • Here’s an example of creating a new virtual environment:

mkvirtualenv -p /Library/Frameworks/EPD64.framework/Versions/Current/bin/python --system-site-packages epd64

So far so good. You’ll need the virtualenvwrapper package to use mkvirtualenv.


JPL Infographics

The U.S. Department of State’s Bureau of International Information Programs site describes the new JPL Infographics site:

JPL Infographics, a newly launched website and resource database from NASA’s Jet Propulsion Laboratory in Pasadena, California, is inviting space aficionados and graphic wizards to take on a visual challenge by grabbing NASA data and transforming them into a scientific work of art.

The website provides extensive collections of NASA science and mission data, graphics, and space images that members of the public can download and use to create their own infographics — creative illustrations of complex data. Users can upload their infographics, have them reviewed by JPL experts, and share their creations in a public gallery on the JPL website.

Free data, a task, and a place for feedback. Nice work JPL.


Social Media Event Diffusion

Link parkin’: “Event Diffusion Patterns in Social Media”, by Kim, Xie, and Christen.

In this paper, we propose a method for identifying real-world events on social media, and present observations about event diffusion patterns across diverse media types such as news, blogs, and social networking sites. We first construct an event registry based on the Wikipedia portal of global news events, and we represent each real-world event with entities that embody the 5W1H (e.g., organization, person name, place) used in news coverage. We then label each web document with the list of identified events based on entity similarity between them.

Mostly interesting in how it uses Wikipedia events pages, and basic entity extraction, as a reference registry for events.


Olympics, Same and Different

I’ve grown into the same sentiment as Chris Bellavita:

I’ve had the fortune to participate in or observe security operations for six Olympiads. Based mostly on that experience, I’m not a fan of the Olympic Games.

The Olympics are a circus that comes to town for a little while, selling promises and dreams. It helps separate people from their money, then packs up the tents and moves on to the next stop.

Now he has quite a different personal perspective, being directly involved in security for many Olympiads. But just as someone interested in great competition the games for me have devolved into an addendum to spectacle, celebrity, and corporatism.

This year, NBC’s misguided, but highly successful, broadcast strategy caused me nary a concern, victim of my newfound television withdrawal. Never felt better. There was a minor reflexive twitch when I heard about the US Women’s Soccer National Team go into overtime against Canada, but I got over it.


Onward Adrian

In my past life, I had a somewhat unnatural, and prescient if I do say so myself, interest in the intersection of technology and journalism. I would even hang out with ink stained wretches on a routine basis and help train up a few on that ’thar new bloggin’ thang.

I had enough pull to get invitations extended to various people doing cool things at the Lawrence Journal-World in Manhattan, Kansas. This included one Adrian Holovaty, principal instigator of the Django web development platform, née content management system. Of course when he came to Northwestern I enjoyed having a needy nerdy chat. Curse you autocorrect!

We never got synch up on any interesting projects, but he went on to do some nice work with The Washington Post and launch the highly impactful EveryBlock. Now Adrian’s moving on to pastures TBD. EveryBlock and Django, not bad for a developer’s career and I’m sure he’s got plenty left in the tank.

Good luck buddy!


Blogaversary, Macaversary 4.0

Not much to say here. Trucking along at full speed into year five of this blog adventure. I have grown comfortable without a grand plan and am enjoying just keeping the ride going.

Happiness is still a warm MacBook, maybe not so new. Think I’ll hold off on a laptop upgrade until something comes along a little closer to the sweet spot. Plus I’ve got some other clear cut need purchases this year, and I definitely want an iPhone 5. Besides, the new toy has been plenty of upgrade for one year.


Farewell Pushcart Queen

Breaking my embargo on eulogies, I’m shamelessly ripping off my title from the MetaFilter post that put me on notice of Jean Merrill’s passing:

The Pushcart War was one of those childhood books which changed my life forever, teaching me the importance of activism and civic participation. Merrill’s stories also show how much fun organising can be. We have lost a great and creative soul, someone capable of introducing young people to issues that matter with passion and a twinkle in her eye.

Along with The Phantom Tollbooth, The Pushcart War marked my transition to “serious” reader. Yeah, I proceeded to still read a bunch of junk. But I forever knew there would be stuff out there with more heft and style. There did exist a higher standard to seek out and aspire to. I’m still managing to work towards that standard, bit by bit.

Godspeed, kind lady.


The Twitter Comedy Factory

Here’s a study I’d be interested in seeing just on general interest. The usage of Twitter as a comedy platform. Seems like there’s a @FakeX account for any value of X and half a million “@!*$ My Whatever Says”. Is this a new comedic venue? How big is it in terms of creators, participants, and dollars? And put it all in a comedy historic context for me.

Bonus points if you’re funny.

Sigh. So many ideas. So little time.


SportsNerd, A Sports Tech Vertical

Bleacher Report LogoThis is just off the cuff daydreaming with about zero background research. It’s inspired by the recent purchase of Bleacher Report for just a little south of $200 million and The Art of Manliness’ two part guide on …Starting a Side Hustle.

Assertion: there is no part of the modern worldwide sports industry that is not being heavily technology influence. Athletic performance? Check! Financials (including gambling)? Check! Media? Check! Analysis? Check!

Seems like a Web publication that was hyper-focused on the varied intersections of sports and tech might have a chance. Read as Gizmodo or Engadget but always coming with a sports angle. Broad situational awareness of tech impact on sports, augmented with deep dives, interviews, and even opinion pieces. You could throw in a little humor and still not fall into the T and A tarpit.

The neat trick is that you could probably get something rolling on the real cheap as a simple linkblog. For bonus points algorithmically trawl the intarwebs to decrease/scale human involvement. The side hustle is to grow it into something on the order of Daring Fireball or kottke.org. Individual voice, somewhat broad taste, but with a clear content focus. If there’s any traction, grab a buddy or two, or find some enthusiastic hobbyists to build out in the direction of a serious web publishing business. Of course the really hard part is marketing to a) build out the audience, and b) generate revenue probably through advertising, but maybe some other creative financing.

Definitely not an unheard of story, but it’s hard to believe that someone hasn’t already tried this and failed miserably, or already been swallowed up by one of the big boys.

Anyway, yet another Side Project ™ to take on in my Copious Spare Time ™.


GPS: A Graph Processing System

Link parkin’: GPS: A Graph Processing System

GPS is an open-source system for scalable, fault-tolerant, and easy-to-program execution of algorithms on extremely large graphs. GPS is similar to Google’s proprietary Pregel system, and Apache Giraph. GPS is a distributed system designed to run on a cluster of machines, such as Amazon’s EC2.

We have completed an initial version of GPS, which is available to download. We have run GPS on up to 100 Amazon EC2 large instances and on graphs of up to 250 million vertices and 10 billion edges.

Okay, it’s from Stanford, which raises suspicion (Go Bears!), but 10 billion edges is no joke.


iOS XScreenSaver

Link parkin’: XScreenSaver for iOS. From the Wikipedia entry for the UNIX classic

XScreenSaver is a collection of about two hundred free screensavers for Unix, Mac OS X and iOS. It was created by Jamie Zawinski in 1992 and is still maintained by him.

Amazing that such an old piece of kit can migrate to such modern platforms.


WordPress iOS 3.1

There’s a new release of the WordPress App for iOS.

This is a big one! The WordPress for iOS dev team is excited to announce that version 3.1 is now available in the App Store. In this update you’ll find a redesign for both iPhone and iPad, along with loads of new features!

Thank goodness. The current editing UI on the iPad is driving me up a wall. Too many switches out of editing mode to do simple things. Hope the new version is a bit more efficient.


Titan Graph Performance

I’ve mentioned the Titan graph store before. Aurelius, the company building Titan, has posted a performance study of the DB that heavily relied on Amazon Web Services:

The presentation to follow discusses the simulation’s social graph structure, the types of processes executed on that structure, and the various runtime analyses of those processes under normal and peak load. The presentation concludes with a discussion of the Amazon EC2 cluster architecture used and the associated costs of running that architecture in a production environment. In short summary, Titan performs well under substantial load with a relatively inexpensive cluster and as such, is capable of backing online services requiring real-time Big Graph Data.

The kicker is that their analysis of one year’s AWS costs to support a half-million users is well within an early startup’s wheelhouse. Also, their modeling and sim approach is pretty interesting. They’ve probably simplified the discussion approach for presentation, but there’s a bunch of approaches in the CS literature that are also applicable.


Getting Jacked

Mat Honan got jacked. Kiddies wrecked his online presence for the lulz, exposing some severe security deficiencies in Amazon and Apple security procedures. It’s a bit of a cause célèbre, invoking a lot of suggestions on how to safeguard one’s self.

Like Tim Bray, I heart 1Password but need to do more, especially since I’m a heavy Amazon user. Following Matt Cutts’ advice, I’ll be turning on 2 Factor Authentication. Wired also has a number of solid suggestions, along with lots of other info kicking across the Web. Do yourself a favor and partake of the wisdom.

And I always thought that remote wipe capability was a bit of a hinky idea.


The CS CEO

I’m really surprised to hear this, almost to the point of fisking it out, but I trust Greg Linden. Marissa Mayer is the first person with an advanced CS degree hired into a Fortune 500 CEO position.

Marissa Mayer as CEO of Yahoo may be a test of a new style of executive leadership, the optimizing CEO.

She is not the first computer scientist to lead a major company, but she is the first computer scientist (MSCS or higher) hired in as CEO to a Fortune 500 company. Many computer scientists view everything as an optimization problem. People, work, politics, life, everything is a search (often of a dynamic space) to find a maximum near the global maximum.

On the other hand, I’m not that surprised in that there are only so many of those spots, they haven’t been open to everyone for much of history, and they typically skew towards MBAs. The qualifier, “hired in”, also makes a big difference. There have been a number of advanced CS degree CEOS that have grown companies into Fortune 500 members. This may be the first one that has to inject (reinject?) a CS culture into such a large corporation.

And I agree with Greg that a systems optimization perspective in a Fortune 500 CEO will be an interesting experiment.

Lastly, never really thought of Yahoo! as a Fortune 500 company.


Trident, Realtime Big Data

Nathan Marz is still a badass, despite being swallowed up by Twitter:

Trident is a new high-level abstraction for doing realtime computing on top of Twitter Storm, available in Storm 0.8.0 (released today). It allows you to seamlessly mix high throughput (millions of messages per second), stateful stream processing with low latency distributed querying. If you’re familiar with high level batch processing tools like Pig or Cascading, the concepts of Trident will be very familiar - Trident has joins, aggregations, grouping, functions, and filters. In addition to these, Trident adds primitives for doing stateful, incremental processing on top of any database or persistence store. Trident has consistent, exactly-once semantics, so it is easy to reason about Trident topologies.

We’re really excited about Trident and believe it is a major step forward in Big Data processing. It builds upon Storm’s foundation to make realtime computation as easy as batch computation.

I’m sure Trident has some sharp pointy edges, but it also looks like a bag of fun.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.