home ¦ Archives ¦ Atom ¦ RSS

Git Conflict Resolution Tutorial

Link parkin’: Git & GitHub Foundations • Conflict Resolution Editor OEmbed Link rot on URL: http://www.youtube.com/watch?feature=plcp&v=xfh13LcgqIU

This is one aspect of git that I don’t have a good handle on. Hoping this YouTube tutorial from GitHub can clarify things. Also looks like there a few other YouTube tutorials on the topic.


Souping Up Ye Olde MacBook

Despite the unveiling of the 13-inch Retina MacBook, I’m still interested in extending the life of my good ’ole white MacBook. Partially just because I’d like to see how far I can push it, partially because I’m not really home hacking enough to justify the purchase, and mostly cause I’m currently in a thrifty phase. An article from ArsTechnica captures where I’m at:

Given that everything else is already maxed out, we wanted to upgrade to an SSD in order to squeeze a bit more life out of this laptop. If you have an older computer that already has the maximum RAM but still uses a hard drive, and you’re not in the market for a new computer anytime soon, you can give it a boost with an SSD upgrade.

Prices have come down significantly since the article posted last year. You can get 512 Gb of SSD storage for 400 bucks. On top of that you can replace the optical drive with a 1 Tb HDD for a total of $150. So for less than a Mac Mini, or 25% of the dream MacBook, I could probably get at least another year to year and a half out of my MacBook. Nice!

Too bad you can’t processor upgrade these things.


Marco & The Jambox

When I was in the AT & T store ordering my new phone, I saw an actual Jambox (affiliate link warning) for real. Looked much smaller than I thought, but I wondered if it was any good.

Marco Arment has an answer:

The original Jambox, while it’s a delightful product otherwise, has two major flaws for this use. It vibrates so much with bassy songs at high volumes that it can easily vibrate itself off the edge of whatever it’s sitting on. (Fortunately, it’s also very durable.) And while it’s impressively loud for its size, it can’t get loud enough for spoken podcasts to be heard consistently clearly in a noisy shower. …

I can also recommend the original Jambox, but with hesitation: it’s only the better choice if the price difference or portability are most important to you.

For my particular use cases, that’s actually a pretty good endorsement.


Rob Pike Provocateur

I’m old enough to remember when Plan 9 was new, shiny, and really interesting. Rob Pike, one of Plan 9’s creators, takes advantage of his profile at The Setup to lob a few hand grenades in the direction of “cloud computing”:

A bunch of Macs at home, Macs and Linux at work, plus of course the Google compute clusters. When I was on Plan 9, everything was connected and uniform. Now everything isn’t connected, just connected to the cloud, which isn’t the same thing. And uniform? Far from it, except in mediocrity. This is 2012 and we’re still stitching together little microcomputers with HTTPS and ssh and calling it revolutionary. I sorely miss the unified system view of the world we had at Bell Labs, and the way things are going that seems unlikely to come back any time soon.

I most enjoyed how this kicked the anthill over at Hacker News.


Good ’Ole SQS

Jeff Barr writes regarding some new additions to SQS:

We announced the Simple Queue Service (SQS) eight years ago, give or take a day. Although this was our first infrastructure web service, we launched it with little fanfare and gave no hint that this was just the first of many such services on the drawing board. I’m sure that some people looked at it and said “Huh, that’s odd. Why is my online retailer trying to sell me a message queuing service?” Given that we are, as Jeff Bezos has said, “willing to be misunderstood for long periods of time,” we didn’t see the need to say any more.

Eight years?! Why it seems just like yesterday that I was being somewhat prescient and errr, overly effusive about Amazon’s first bit of Infrastructure as a Service.


RethinkDB

Link parkin’: RethinkDB

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

Could this be the document oriented DB that MongoDB could (should?) have been?


NBA GameTime Plus

NBA Logo Small After my great experience with streaming audio in Major League Baseball’s AtBat 2012, I started venturing out to see what the other pro sports made available. The National Hockey League has nuked itself with it’s lockout, but that still leaves the National Football League and National Basketball Association.

On both fronts the news has turned out well. The NFL charged me $24.99 for a full season of access to streaming audio for every game through their NFL 2012 iPad app. Probably a tad overpriced, but NFL Audio Pass is still worth it to me. The crappy thing is that the iPhone version of the app can’t do streaming audio. WTF NFL?! MLB has this down cold, so there can’t really be any technical challenges.

The NBA experience started out poorly, but so for looks promising. If you go to their site, the most obvious product is NBA Audio League Pass. This is Flash based streaming audio, which is free, but basically only works on desktops and Android devices. Not helpful to me.

However, the NBA Game Time 2012 iOS App has an in-app purchase of streaming audio for $9.99. The NBA site doesn’t exactly do a great job of publicizing this, but if it works well it’s a pretty good product. Essentially the same price as AtBat 2012 with just a smidge less polish on the app. I’ve got it running on my Jesusphone 5 streaming the Lakers at the moment.

Sigh. So now it turns out that the NBA audio only works on phones, contra the NFL. What gives guys? It’s the same damn OS and streaming audio is pretty mature.


curlish

Link parkin’: curlish

curl with flames on top

Ever had to speak to an OAuth 2.0 protected resource for debugging purposes? curl is a nice tool, but it totally lacks helpers for dealing with oauth.

curlish comes for the rescue. It is able to remember access tokens for you and inject it into requests. Facebook comes preconfigured so you can start using it right away.


TIL TMDB

Today I Learned about The Movie Database:

OEmbed Link rot on URL: https://twitter.com/pypi/statuses/266869622580588544

Looks like they have an extensive API.


Diggin’ On Elementary

Criminal Minded Cover Broke out Boogie Down Production’s Criminal Minded on the Jesusphone 5 today. Forgot how enjoyable the track Elementary was:

DJ Scott LaRock and I: KRS-One
Our mother’s first son and no, we’ll never run
From complex situations like you T-O-Y-S’s
Always talkin junk, yet in jail, you’re rockin dresses
I have arrived for the purpose of joy
Unlike any ordinary Bronx b-boy
I will volunteer my services and launch an attack
On you fake educators with your yakety-yak

Great beat too.


Wayne Enterprises Chronicles: Week 9

The Dark Knight Logo Mini Yet again, the agony of Defeat.

This one stings because a I had great projection of 120+ points for my lineup versus my opponent looking at 83.

Key factors:

  • RGII didn’t get in the end zone.
  • I finally got McFaddened. Run-DMC gave me two points before bowing out in the first half with an injury.
  • Arian Foster had a workman like 17+ points. His counterpart on the opposition, Adrian Peterson went for 32.
  • No one else on my roster got in the end zone.

The only bright spot was Eric Decker who way overperformed with 25 points.

Just one of those weeks. Unfortunately, now I feel like my team is listing at 5 and 4 instead of cruising at 6 and 3. Still holding second place in the league though.


Time Zone Hate

OEmbed Link rot on URL: https://twitter.com/joshsusser/status/257725572275376128

What he said. And JavaScript datetime strings too. Taking cross-browser incompatibility Beyond The Browser.


sphinxit

OEmbed Link rot on URL: https://twitter.com/pypi/status/265710020476428289

I’m intrigued because I still have a soft spot for the Sphinx search engine, but the 0.2 release of sphinxit is not for me. Hard to kick the tires without documentation. But I’m rooting for the dev. Looks like an SQLAlchemy for Sphinx queries.


Few, Signals, & Noise

A while ago I finished Nate Silver’s new book (Amazon Affiliate link warning) currently enjoying a bit of attention at the end of this year’s political season. I enjoyed it, although I got a little too much poker, but haven’t had the time or urgency to put together a respectable write up. Thankfully, the estimable Stephen Few has done my work for me. I won’t steal his closing graf which is the summary I wish I would have written. Go read all of his post. But here’s a teaser:

I found a kindred spirit when I recently read Nate Silver’s new book The Signal and the Noise (Penguin Press, 2012). I want to give you a sense of the book and it’s powerful message by sharing a few excerpts from the introduction.


Less Chapped

After a teardown looks like I shouldn’t be all that irritated about my limited window of smug superiority:

It should come as no surprise that Apple’s fourth-generation iPad, officially released today, is barely discernible from the third-generation iPad released earlier this year. Even the teardown experts at iFixit had a hard time telling the two devices apart after cracking open their aluminum and glass shells. The real takeaway is that third-generation iPad owners shouldn’t feel like they are stuck with yesterday’s tech.


Welcome Back NBA

NBA Logo Small With any luck, I will be unleashing this post while sitting in the Verizon Center during the Washington Wizards 2012-13 home opener against the Boston Celtics. After watching bits of a few games during opening week, here are a few observations:

  • The Wizards are still a bunch of piece parts, and I’m not sure John Wall will make all that much difference. At least they’re a bit more professional and likable than last year.
  • Based upon how close they played the San Antonio Spurs, the New Orleans Hornets might be quite surprising this year. They’ve got four rookies, Anthony Davis, Al-Farouq Aminu, Austin Rivers, and Darius Miller, that look like they can play. Then there’s a core of good, e.g. Brian Ryan Anderson (Go Bears!) but not great veterans as support. Wouldn’t be surprised to see them challenge for the last playoff spot in the West.
  • The Los Angeles Clippers, yes Clippers, have a hell of a bench.
  • James Harden may go off for a while as the #1 guy for the Houston Rockets, but then the Association will get a book on him. Let’s see what happens then.
  • The Knicks are still stuck in neutral, and somehow they feel even older than last year.
  • Can’t wait for DRose to get back: OEmbed Link rot on URL: http://www.youtube.com/watch?v=dtj-D8HT9BY&w=400&h=225

Dadgum PostGIS!!

PostGIS Logo Small I’ve been enjoying the power of PostGIS at work, although it confounds me to no end. Given the amount of data I’m trying to query against, typically upwards of 10s of millions of rows, I haven‘t found writing efficient spatial queries to be straightforward. This week provided an opportunity to develop a hypothesis about why.

On one machine, I have my spatial DB on a traditional spinning disk HD drive. A query I wrote took about 6 hours. I’ve taken up the radical experiment (for our research org, we move hella fast ;-/) putting the same DB on a consumer grade Solidstate Storage Disk to see what would happen. Query time dropped to about 10 minutes. My back of the envelope calculation shows a 36x improvement. Caveat this with the understanding that I have in no way conducted a scientific comparison. Apples to oranges and all that.

Still the query went from doable to damn useful. Why? My guess is that spatial data and indexing are hard to lay out for good sequential access. Random disk seeks wind up being the order of the day. Thus, the advantages of SSDs really start to shine.

Just a hunch, but I really need to conduct some deeper investigation. And maybe attend some local geo meetups to commiserate with fellow travelers.


Wayne Enterprises Chronicles: Week 8

The Dark Knight Logo Mini Victory! Back to back wins to cross over into the second half of the season. Moves me comfortably into second place in the league.

I got a lot of nice contributions from every position except TE. RGIII was bottled up by the Pittsburgh defense but still managed double figure fantasy points. Tony Gonzalez (Go Bears!) had his first stinker fantasy game with only 4.4 points.

The deficit against projection was made up at RB, starting Darren McFadden and Stevan Ridley, along with the WR spots of Miles Austin and Eric Decker. Ridley and and Decker were the big wins, both being second tier players but coming in at +6. Add in my kicker, Sebastian Janikowski, at +6 as well and I actually came in +5 against my projection.

My opponent put me on pins and needles Monday night, with the San Francisco DEF as his last player. They would have needed to score 18 points to wreck my night, but it wasn‘t out of the realm of possibility given the strength of the 49er D and the crappiness of the Arizona QB.

But it all played out well, and ended in a seven point victory.

On another note, just more evidence of the emergence of computational journalism. Yahoo! has outsourced automating game summaries for fantasy football matches (!!). And the results aren‘t all that bad.

Computational journalism cribbed from Irfan Essa.


Real-Time and Big Data

Preprint paper from the guys at Twitter: “Fast Data in the Era of Big Data: Twitter’s Real-Time Related Query Suggestion Architecture”

We present the architecture behind Twitter’s real-time related query suggestion and spelling correction service. Although these tasks have received much attention in the web search literature, the Twitter context introduces a real-time “twist”: after significant breaking news events, we aim to provide relevant results within minutes. This paper provides a case study illustrating the challenges of real-time data processing in the era of “big data”.

Via @lintool


C’Ya Sandy

Bye bye Hurricane Sandy! You got our little development here in Leesburg for two hours of power outage, and our townhouse for a little roof leakage, but otherwise you weren’t too bad. You didn’t even get after my Linode up in the Newark datacenter.

I was highly impressed with your wind gusts though.


A Bad Way To Go Out

I know I’m a bit late to the party, but poor Ozzie Guillen got let go in what may be an MLB first. The Marlin’s posted his firing on Twitter:

Yikes! Amongst Chicagoans, there will always be a soft spot for Oz though, thanks to that 2005 World Series. But I had a feeling it would end badly, from the day he signed on.

P.S. More evidence that it’s all just media now. Now need to use that anachronistic “social media”.


Basic Common Crawl Processing

Pavel Repin copiously documents his initial foray into processing the Common Crawl data set:

At my company, we are building infrastructure that enables us to perform computations involving large bodies of text data.

To get familiar with the tech involved, I started with a simple experiment: using Common Crawl metadata corpus, count crawled URLs grouped by top level domain (TLD).

It’s not a very exciting query. To be blunt, it’s a pretty boring one. But that’s the point: this is a new ground for me, so I start simple, and capture what I’ve learned.

This gist is a fully-fledged Git repo with all the code necessary to run this query, so if you want to play with the code yourself, go ahead clone this thing.

Via Pete Warden


RESTful or Restless?

In my REST API expeditions at work, I’ve been using Flask-Restless. Now, via Python Weekly, I find out about Flask-RESTful. Normally I’d just scan and move on, but RESTful is from folks at Twilio and may have a bit more polish. To wit:

While Flask provides easy access to request data (i.e. querysting or POST form encoded data), it’s still a pain to validate form data. Flask-RESTful has built-in support for request data validation using a library similar to argparse.

The only hitch I see is no examples of connecting with ORM based models, admittedly after only 10 minutes with the docs. Restless actually handles this use case pretty well.

Alternative approaches are always good to know about.

Also have to say thumbs up on Python Weekly. Once a week to my Inbox, an easy read, at least one good link.


jq JSON Processing

Link parkin’: jq: a lightweight and flexible command-line JSON processor.

jq is like sed for JSON data - you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text


bitmapist

Link parkin’: bitmapist

OEmbed Link rot on URL: https://twitter.com/pypi/statuses/261516455638597632

Implements a powerful analytics library using Redis bitmaps.

This library makes it possible to implement real-time, highly scalable analytics that can answer following questions:

  • Has user 123 been online today? This week? This month?
  • Has user 123 performed action “X”?
  • How many users have been active have this month? This hour?
  • How many unique users have performed action “X” this week?
  • How many % of users that were active last week are still active?
  • How many % of users that were active last month are still active this month?

Wayne Enterprises Chronicles: Week 7

The Dark Knight Logo Mini Victory!! It was a trouncing, with an 80+ point margin of victory. The other guy mailed it in, starting three players that were on bye this week. Then he got nothing out of his tight end, leading to a grand total of 31 points.

Not much to dig into, but I’m glad to break the losing streak.


Mac Tweetbot

Tweetbot Logo The Tapbots finally released their official for-pay version of Tweetbot for the Mac. The $20 price, driven by some of Twitter’s token limiting practices. induced some sticker shock. However, I didn’t think long before pulling the trigger.

Lex Friedman has a pretty good review of Tweetbot explaining why it’s worth the price:

Of course, this is a review of the excellent new Tweetbot for Mac (Mac App Store link), not a review of Twitter’s business practices. But I bring up the latter here because one of the effects of Twitter’s new developer restrictions—specifically, the finite limit on how many users a given Twitter client can support—is that developer Tapbots is charging more for the new app than originally planned. Specifically, Tweetbot for Mac will cost you $20, at a time when many similar apps can be found for $10 or less. Which means that for many readers, the question isn’t just whether Tweetbot is good, but whether it’s worth the price.

My answer: Yes.

Like Friedman, I’ve sprung for Tweetbot on MacOS, iPad, and iPhone.


Stoked and Chapped

Forgive me Saint Steven for I have sinned. As a recently realized Apple fanboy I failed to tune in for today’s announcements. My employer sent me to occupy a lonely (but very shiny!) booth in the bowels of a second rate hotel with an exceedingly lackluster conference. But I should have found A Way!

The only in-depth coverage I have been able to read so far, is Marco Arment’s assorted thoughts. They have me stoked:

Prior to this, the computer I recommended for nearly everyone was the 13” MacBook Air. But the new 13” Retina MacBook Pro is only about 0.6 pounds heavier, has much higher CPU, RAM, and storage options, and has the much nicer Retina screen. It commands a premium of about $500, which is significant, but you get much more for it.

That’s pretty much the machine I’ve been fiending for for the longest time. I’ll nitnoid about not being able to amp the DRAM to 16Gb, but that’s the only thing I could quibble about. However, between lack of funds, and dampened laptop usage due to life and other gear, now I may just hang on to Ye Olde MacBook indefinitely. Or at least there is clear provocation.

But I’m simultaneously chapped

The timing of the update — just 6 months after the iPad 3, instead of the usual year — will anger a lot of iPad 3 owners. But the previous March releases of the iPad 2 and 3 were more problematic.

You mean The New Toy is already not the new hotness? Ya killin’ me.

At least I have my new Jesusphone 5 to keep me warm at night.


Mushroom Jazz Mixtapes

Just do as the man says.

Four classic Mushroom Jazz recordings, for free. What’s not to like?


Now With More Sparks

@bigdata continues to sing the praises of Spark.

In an earlier post I listed a few reasons why I’ve come to embrace and use Spark. In particular I described why Spark is well-suited for many distributed Big Data Analytics tasks such as iterative computations and interactive queries, where it outperforms Hadoop. With version 0.6, Spark becomes even faster and easier to use. The release notes contain all the detailed changes, but as you’ll see from the highlights below, version 0.6 is a substantial release. Another good sign is the growth in number of contributors, with now over a third of the developers coming from outside the core team in Berkeley.


Wayne Enterprises Chronicles: Week 6

The Dark Knight Logo Mini Defeat. Sigh. This three game losing streak is getting me down. There’s really not much analysis to be done this week. RGIII kept me in the game with a big day, but when four out of eight spots are subpar, pulling out a win is difficult. At least my guys took it into the Monday night game where it basically boiled down to Eric Decker on my side versus Antonio Gates for the opponent. I don’t know what the Broncos were thinking, but they weren’t thinking about guarding Antonio Gates.

Bye week hell this round, but I’m pretty decently set. And the opponent is in a tough spot, with three of his key players on the Falcons who are off this week.


Discogs Doubleplus Good

Partially because it wasn’t obvious to me where to find it, but mostly because it’s cool

Download Discogs Data

Here you will find monthly dumps of Discogs Release, Artist, and Label data. The data is in XML format and formatted according to the API spec: http://www.discogs.com/developers/

This data is released under the Public Domain license: http://creativecommons.org/licenses/publicdomain/

And funny how we just have to rediscover some things.


Python API Building

Nice 4 step tutorial on building a RESTful API using Python, by K. P. Kaiser. The best tip was a pointer to elasticutils, an ElasticSearch library:

It’s always a good idea to see what your options are in a library. Initially, when I was building this integration I saw pyes, a very well written library, but the code to use it seemed a bit ugly for my tastes.

Luckily, after a bit more searching, I found elasticutils, which is, in my opinion, a much cleaner interface to the very simple elasticsearch server. It always pays to take a few minutes to read the introduction, and example code before deciding on a library. Elasticutils actually uses pyes under the covers.


FDWs FTW

Craig Kerstiens on putting Redis in Postgres

SQL is an expressive language, though people are often okay with accessing Mongo data through its own ORM. The real value is that you could actually query the data from within Postgres then join across your data stores, without having to do some ETL process to move data around.

From experience, here be dragons, but used judiciously Postgres’ Foreign Data Wrappers are a great feature.


Summarizing Streams

From A Framework for Summarizing and Analyzing Twitter Feeds (PDF alert)

In this paper, we present a dynamic pattern driven approach to summarize data produced by Twitter feeds. We develop a novel approach to maintain an in-memory summary while retaining sufficient information to facilitate a range of user- specific and topic-specific temporal analytics. We empir- ically compare our approach with several state-of-the-art pattern summarization approaches along the axes of storage cost, query accuracy, query flexibility, and efficiency using real data from Twitter. We find that the proposed approach is not only scalable but also outperforms existing approaches by a large margin.

My quick half-ass summary:

  1. use frequent item set mining, to come up with a code book
  2. recode your content with the code book
  3. compress, which exploits the redundancy uncovered by the coding
  4. profit!!

But I need to read the paper more closely. And the real-time summarization and topic tracking aspects seem really cool.


Discogs oEmbed

Since I’ve been heavily using the WordPress embed feature for tweets, I had a though that it would be nice if you could do the same thing for referencing musical releases in Discogs.com. But alas, they seem not to support oEmbed.

However Discogs does seem to have a nice client API, with a supporting Python module to boot. Seems like a Discogs oEmbed proxy server might be a nice self-contained hacking project.


This Must Happen!

The Little Guy (TM) has been watching a lot of Yo Gabba Gabba! when it hit me.

Lady Miss Kier must do a dancey dance!! That would be AWESOOOOME!!

OEmbed Link rot on URL: http://www.youtube.com/watch?v=Ibk4Diagkok

If they can find a spot on the show for Metta World Peace, they can put a little groove in the heart.


Atomic News

Circa seems pretty interesting even though it’s YAPTRN

Part of the thinking behind Circa comes from ideas that have been described by author and journalism professor Jeff Jarvis, as well as media-startup veteran David Cohn, who is also a founding partner of Circa and acts as its editor-in-chief. The main idea is that the traditional article or story format that newspapers and other news outlets have produced for so many years no longer fits with the way we produce or consume information now. The standard “inverted pyramid”-style article was designed for the days when people might only see one report about a news event, printed on dead trees and without links, so it had to include virtually everything.

The gravity of mobile is counteracting the inertia of print in terms of news distribution, leading to some interesting approaches. The Web in general provided the initial escape velocity, but couldn’t quite get news out of print’s orbit. The confluence of current economics and mobile adoption feels like it could provide enough acceleration to do the trick.

The key is that Circa has pushed back into the newsroom’s production process and if successful in a big way will be a sea change.

Feh. Even though I wrote it, that next to last paragraph was way too weasel wordy, wonky, and jargony.


Network Structure Taxonomies

Link parkin’: Taxonomies of networks from community structure

The study of networks has become a substantial interdisciplinary endeavor that encompasses myriad disciplines in the natural, social, and information sciences. Here we introduce a framework for constructing taxonomies of networks based on their structural similarities.

This paper may set a record for number and word count of author affiliations.


Nerds, Geeks, Dorks

Glyph Lefkowitz wrestles with some definitions:

So I always feel a twinge when I identify myself as a “geek”. I usually prefer to say that I am - or at least aspire to be - a nerd.

Have to say I’m quite sympathetic. Suppressing my inner dork is a daily, even hourly, challenge.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.