home ¦ Archives ¦ Atom ¦ RSS

Datastructures With Norman

[embed]https://twitter.com/pypi/status/301268885045379072[/embed]

Speaking of finding interesting things on @PyPi, here’s Norman

Norman is a framework for advanced data structures in python using an database-like approach. The range of potential applications is wide, for example in-memory databases, multi-keyed dictionaries or node graphs.

For the longest time I’ve been thinking one could transliterate prefuse into Python to enable interactive visualization programming at a high level. The critical hurdle was prefuse’s table oriented datastructures and queries. In-memory sqlite could probably do the trick, but then you’ve got to deal with serialization and deserialization of Python objects.

Norman looks like it might fit the bill better for a prefuse knockoff.


So That Explains It

I follow @PyPi on Twitter, which just streams Python package announcements. It’s a cheap way to get exposure to new and interesting modules. But everyday it seems like there a couple of newly minted 0.1 packages for “printing nested lists”. Curious but not worth investigating.

Curiosity satisfied thanks to the Python Reddit, another useful place for Python developments:

They’re generated by people following along an example in the book Head First Python.

The book’s author has amended the lesson (through errata and next edition I guess) to point learners at testpypi.python.org (which didn’t exist at the time the book was written).

I run a cleanup script that deletes them every now and then. I haven’t run it for a while… I’ll put it on my looong TODO list…


IPython mini-Book

Will definitely have to shell out for Cyrille Rossant’s Learning IPython for Interactive Computing and Data Visualization

This book is a beginner-level introduction to IPython for interactive Python programming, high-performance numerical computing, and data visualization. It assumes nothing more than familiarity with Python. It targets developers, students, teachers, hobbyists who know Python a bit, and who want to learn IPython for the extended console, the Notebook, and for more advanced scientific applications.

Too much good e-book tech material at a good price these days.


CloudPull

From GoldenHill Software

CloudPull seamlessly backs up your Google account to your Mac. It supports Gmail, Google Contacts, Google Calendar, Google Drive (formerly Docs), and Google Reader. By default, the app backs up your accounts every hour and maintains old point-in-time snapshots of your accounts for 90 days.

Emphasis mine. Gonna’ try this out over the weekend.


Top Casts

Although I’ve fallen off the film viewing wagon, I’m always intrigued by movies with “all-star” casts. For example, Pulp Fiction has Travolta, Jackson, Thurman, Willis, Roth, Plummer, Rhames, Walken, Buscemi, Keitel, and of course Tarantino as actors. I’ve never seriously sat down and tried to quantify what this meant, but 10 “big time” stars seems like a reasonable threshold.

Then of course, the question is what’s “big time”? And there is the sticking point.

Today I had the brilliant idea that you could, relatively easily, define “top billing” based upon IMDB movie data. If an actor is listed as say one of the top 5 for their gender in the credits (for a few years?) call them an All-Star. Still a little squishy but firmer. Then you can quantitatively evaluate each film, rank, and decide.

Interesting challenge, and I wonder how it could apply to major league sports teams?


Making Data Progress

Discogs Logo Slowly making headway downloading the Discogs data dumps. Got 19 complete months in hand. Now into the era of no masters files and release files less than 1GB. Current total storage is roughly 29Gb.

Looking forward to some serious data hacking.


The Next Tachyon

[embed]https://twitter.com/bigdata/status/328905487586299905[/embed]

@bigdata goes deeper into Tachyon:

A release slated for the summer will include features2 that enable data sharing (users will be able to do memory-speed writes to Tachyon). With Tachyon, Spark users will have for the first time, a high throughput way of reliably sharing files with other users. Moreover, despite being an external storage system Tachyon is comparable to Spark’s internal cache. Throughput tests on a cluster showed that Tachyon can read 200x and write 300x faster than HDFS. (Tachyon can read and write 30x faster than FDS’ reported throughput.)

Similar to the resilient distributed datasets (RDD) fundamental within Spark, fault-tolerance in Tachyon also relies3 on the concept of lineage – logging the transformations used to build a dataset, and using those logs to rebuild datasets when needed. Additionally as an external storage system Tachyon also keeps tracks of binary programs used to generate datasets, and the input datasets required by those programs.

Terabyte scale analytics at interactive speeds. Coming soon to a laptop near you.


Why Tuples?

Steve Holden, who know’s a bit or two about Python, gives his explanation of the existence of the tuple datatype in the programming language:

And that, best beloved, is what tuples are for: they are ordered collections of objects, and each of the objects has, according to its position, a specific meaning (sometimes referred to as its semantics). If no behaviors are required then a tuple is “about the simplest thing that could work.”

Has some good insights, but I think tuple immutability and hashability is vastly undersold.


Scoffing

I’m back in Philadelphia. Hotel Wi-Fi, I scoff at you again with your $12 (!!) a night charge. With Verizon and AT&T’s LTE on my side, I surf without fear. Unlike last time, didn’t even have to leave the room.


Scaling Mining Analytics

[embed]https://twitter.com/lintool/status/327838986301222912[/embed]

Just a quick scan of Jimmy Lin’s paper (PDF Warning) hints that there are some useful insights regarding logging at scale, which is currently an interest of mine:

A little about our backgrounds: The first author is an Associate Professor at the University of Maryland who spent an extended sabbatical from 2010 to 2012 at Twitter, primarily working on relevance algorithms and analytics infrastructure. The second author joined Twitter in early 2010 and was first a tech lead, then the engineering manager of the analytics infrastructure team. Together, we hope to provide a blend of the academic and industrial perspectives—a bit of ivory tower musings mixed with “in the trenches” practical advice. Although this paper describes the path we have taken at Twitter and is only one case study, we believe our recommendations align with industry consensus on how to approach a particular set of big data challenges.


Safari To Go

TIL there’s an iPad App for O’Reilly’s Safari Online library of books:

Now available for iOS and Android devices. Safari To Go is available for free and delivers full access to thousands of technology, digital media, business and personal development books and training videos from more than 100 of the world’s most trusted publishers. Search, navigate and organize your content on any WiFi or 3G/4G connection. Plus, cache up to three books to your offline bookbag to read when you can’t connect!

Works great for me since my employer provides Safari accounts!


Vincent Pandas

Two great tastes, that taste great together:

The Pandas Time Series/Date tools and Vega visualizations are a great match; Pandas does the heavy lifting of manipulating the data, and the Vega backend creates nicely formatted axes and plots. Vincent is the glue that makes the two play nice, and provides a number of conveniences for making plot building simple.

Useful examples ensue.


Fun With Discogs Data

Discogs Logo I’ve decided to try and pick up a “datadidact” habit, by regularly working with a large dataset. Even if it’s doing lowly basic characterization, this should force me to hone various skills and brush up on some basic knowledge.

Having spoken before of the Discogs.com dataset, their repository would appear to be a treasure trove and is completely unrelated to anything at work to boot. Thought siccing wget on http://www.discogs.com/data/ would be a no-brainer and a quick start. Except it’s blocked to crawlers based upon their robots.txt. ’Twould be nice if the HTTP Response for the data URL actually was more informative than a 500 error, but I can understand where Discogs is coming from.

However, I WILL NOT BE DENIED! Just have to do it tediously by hand through my browser. So be it. The longitudinal analysis possibilities are too intriguing.

Already have some initial data in hand. Not looking forward to dealing with 11Gb XML files, though First item might be to convert the data into a record/line oriented format.


Ascription, Anathema, Enthusiasm

I am definitely glad that Ben Hyde has gotten back to posting at “Ascription is Anathema to Enthusiasm”:

I think the blog is back now. Six months ago it got hacked, in a very minor way; but I was very busy at the time. When my repair attempts stumbled it went onto the back burner. Something weird about mysql character set encodings, or word press evolution’s in that area. In short when I exported the database and then initialized a new one with the exported data things got wonky.

Most noticably sequences between sentences, but other things as well.

One way to address the very busy problem is to lose your job (yes I’ve checked under the bed – thank God it’s not there!).

Sorry about the job, but glad to see an old grizzled Lisp veteran bringing his perspective again.


Wallowing In Data

Matthew Hurst brings an interesting perspective to the intersection of agile development and data products:

The product of a data team in the context of a product like local search is somewhat specialized within the broader scope of ‘big data’. Data is our product (we create a model of a specific part of the real world - those places where you can peform on-site transactions), and we leverage large scale data assets to make that data product better.

The agile framework uses the limited time horizon (the ‘sprint’ or ‘iteration’) to ensure that unknowns are reduced appropriately and that real work is done in a manner aligned with what the customer wants. …

In a number of projects where we are being agile, we have modified the framework with a couple of new elements.

Love the concept of “the data wallow”, which is scheduled team time to deeply dig into the collected data. I’d be interested to hear about specific activities of the wallow and how to make that time productive.


Emacs Custom Shell Config

Another great tip, that I wasn’t aware of, from Emacs Redux:

Emacs sends the new shell the contents of the file ~/.emacs_shellname as input, if it exists, where shellname is the name of the name if your shell - bash, zsh, etc. For example, if you use bash, the file sent to it is ~/.emacs_bash. If this file is not found, Emacs tries with ~/.emacs.d/init_shellname.sh.


Vega and Vincent

Link parkin’: Vincent

The folks at Trifacta are making it easy to build visualizations on top of D3 with Vega. Vincent makes it easy to build Vega with Python.


venv-0

Apparently, I’ve been bootstrapping virtualenv incorrectly. Eli Bendersky does it quite elegantly:

I had to install some packages (Sphinx and related tools) on a new machine into a virtualenv. But the machine only had a basic Python installation, without setuptools or distribute, and without virtualenv. These aren’t hard to install, but I wondered if there’s an easy way to avoid installing anything. Turns out there is.

The idea is to create a “bootstrap” virtual environment that would have all the required tools to create additional virtual environments. It turns out to be quite easy with the following script (inspired by the answer in this SO discussion):

My shtick was to install setuptools, pip, then virtualenv. Bendersky gets them all in one clean shot.


The Tachyon Filesystem

Go Bears! Link parkin’:

Tachyon

Tachyon is a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce. It achieves its high performance by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that are frequently read.

On April 10th, 2013, we put out a soft release of Tachyon 0.2.0 Alpha. The new version is more stable and also features significant performance improvements. It is, however, a soft release and we are working full time towards a full hard release that contains a stable version of the features that we expect will be core to Tachyon. Stay tuned.

And

Compatibility

Hadoop MapReduce and Spark can run with Tachyon without any modifications.


The Impala Inhale

Cloudera’s Impala sounds like an exciting way to query HDFS and HBase data at interactive speeds. But the installation dependencies are sort of painful, basically forcing either the use of Cloudera Manager or Cloudera’s packages for RedHat Enterprise Linux. Checking out the fun-filled requirements even includes this gem:

Impala creates and uses a user and group named impala. Do not delete this account or group and do not modify the account’s or group’s permissions and rights. Ensure no existing systems obstruct the functioning of these accounts and groups. For example, if you have scripts that delete user accounts not in a white-list, add these accounts to the list of permitted accounts.

So now the user accounts space gets littered on along with a bunch of other config files across the filesystem. Yick!

Makes Shark look a lot more attractive from a small scale applied research project level. I think you can do Shark, and Spark, pretty much from tar balls and at user level. And avoiding that enterprisey inhale is a good way to reduce complexity.


Wiz Finale

[embed]https://twitter.com/DidTheWizWin/status/324711538894196736[/embed]

A bit out of gas tonight, but couldn’t let the end of the Wizards season go past without at least some mention. Maybe a longer post to come.

I don’t know if it was a good season for the Wiz, but it was an interesting one. A horrific start, a bumpy middle with some great highlights, and back to the losing to close out the season. Injuries were the big story of the season, followed by glimpses of potential from a backcourt of Wall and Beal.

Next season looks promising, but there’s some trepidation. The GM? Well he’s still the GM. Made some good moves this past year, but the past choices still glare, read Jan Vesely.

Props though to Randy Wittman for a good coaching job this year, under tough circumstances.


The EPL on NBC

Premier League Logo I got stuck in my car for the commute this afternoon, and wound up catching a few segments with the local sports radio yakker. For one chunk, they had this guy Richard Deitsch talking about some big announcement NBC recently had about the English (err Barclay’s) Premier League.

What the? NBC? EPL? How did I miss that.

Turns out NBC Universal scooped Fox and ESPN for the US broadcast rights back in October and I flat out missed it. Then yesterday NBC showcased how they’re going to broadcast all of the games live, albeit across all platforms including the Internet. Hey, I get it. With 380 matches, exactly, you gotta find a place for those QPR-Wigan tilts.

This is interesting on a number of angles. First, as one who’s got international football on the downlow, looks like I’ll be getting comprehensive coverage with a more engaged broadcast partner. The Fox/ESPN combo was doing okay, but ESPN never quite seemed committed despite getting some top notch matches.

Second, someone actually outbid ESPN for a significant sports property. And here I was thinking they’d make a big push into soccer because it seems to me they’re in need of a programming hit. I don’t mean to slight women’s basketball, but the audience for high-school all-star games is probably tiny. And you can only have so many screaming head shows. Plus, ESPN did an admirable job with coverage of the last World Cup. Besides a lot of rich people around the world are really into this sport and they probably haven’t hit the point of ESPN overload we here in the States are experiencing.

Finally, the end result will only be as good as the quality of the announcers, analysts, and production. I’m not too worried as NBC takes the MLS seriously and looks like they’re attracting legitimate football announcers respectable in the UK.

So I’m looking forward to August 2013 and the start of the next Premiership campaign, especially seeing as how Manchester United has all but tied off this one.

Now what’s the deal with the Bundesliga?


“The Human Division”, Done

Over the weekend I finished John Scalzi’s The Human Division in its serial form. In general I enjoyed it, although my complaints about it being quite talky still hold. Actually my biggest complaint is that Amazon couldn’t collapse my one click purchases, so my credit card statement is littered with a lot of noise. There were also quite a few loose threads.

If there’s one standout episode for me, it was This Must Be The Place. For some reason I really empathized with Hart Schmidt even though my family situation in no way resembled his. Maybe I’m misremembering the prior three Old Man’s War books, but there seems to be a much higher level of character development.


Unconscious Therapy

[embed]https://twitter.com/5Magazine/status/323870714539163648[/embed]

From 5 Magazine, Chicago’s own House Music chronicle, comes news of a newly released documentary

Centuries (okay, a few years) ago, Steven Harnell began recording footage for something tentatively given the very tentative title of An Untitled Documentary About House Music. A few people asked me about it, but the project seemed to be sliding into oblivion – until a couple of weeks ago, the site hadn’t been updated since around 2008.

So it was with some surprise that I realized that the Untitled Doc and Unconscious Therapy, a documentary debuting at the Chicago International Movies & Music Festival this Saturday, were one and the same.

Well gosh darn if dreams sometimes don’t come (mostly) true. Still need to see if it’s any good, and requires some sort of DC release, but the film is past a hurdle 90% of film projects don’t cross. So good on ’ya Steve Harnell.

And today all of us could use a little unconscious therapy. As he goes to put on a Mark Farina mix.


O’Reilly’s Graph Databases

O’Reilly Media has a pre-release version of a book on graph database technology:

Graph Databases, published by O’Reilly Media, discusses the problems that are well aligned with graph databases, with examples drawn from practical, real-world use cases. This book also looks at the ecosystem of complementary technologies, highlighting what differentiates graph databases from other database technologies, both relational and NOSQL.

Graph Databases is written by Ian Robinson, Jim Webber, and Emil Eifrém, graph experts and enthusiasts at Neo Technology, creators of Neo4j, the world’s leading graph database.

I’ll probably eventually pick up a copy, but I’m waiting for it to bake a little bit. Also wondering if it’s mainly going to be a promotional for Neo4j.


Sensoring The News

An interesting trend of news orgs exploiting low cost sensors, but I really like the play on words:

When I went to the 2013 SXSW Interactive Festival to host a conversation with NPR’s Javaun Moradi about sensors, society and the media, I thought we would be talking about the future of data journalism. By the time I left the event, I’d learned that sensor journalism had long since arrived and been applied. Today, inexpensive, easy-to-use open source hardware is making it easier for media outlets to create data themselves.


GitX-dev

My git-Fu is getting better, but I’m using it enough to start thinking about using a graphical client:

GitX-dev is a fork (variant) of GitX, a long-defunct GUI for the git version-control system. It has been maintained and enhanced with productivity and friendliness oriented changes, with effort focused on making a first-class, maintainable tool for today’s active developers.

Per usual anything that helps me deal with merges more efficiently and professionally gets my +1.


Diggin’ On Kafka

Putting Apache Kafka through it’s paces at the office and I’m starting to like what I see after some initial confusion.

Why we built this

Kafka is a messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn’s activity stream and operational data processing pipeline. It is now used at a variety of different companies for various data pipeline and messaging uses.

Activity stream data is a normal part of any website for reporting on usage of the site. Activity data is things like page views, information about what content was shown, searches, etc. This kind of thing is usually handled by logging the activity out to some kind of file and then periodically aggregating these files for analysis. Operational data is data about the performance of servers (CPU, IO usage, request times, service logs, etc) and a variety of different approaches to aggregating operational data are used.

In recent years, activity and operational data has become a critical part of the production features of websites, and a slightly more sophisticated set of infrastructure is needed.

I like their particular set of design choices. The client consumer side takes a little getting used to if you’re a lazy pub/sub guy like me. Kafka makes the client do a little bit more work and manage its own topic state. The upside is good to great performance, horizontal scaling, and a client can implement the message delivery semantics (at most once, at least once, exactly once) the client needs, without spoiling it for everyone else.

Great to see logical message offsets in the upcoming 0.8.x releases to make life a little easier.


The Glove and The Wall

Wizards Ball Logo Gary “The Glove” Payton was recently elected into the Naismith Basketball Hall of Fame. I saw a clip right after the announcement and Payton said a couple of interesting things. First, he said he thought of the current crop of players, John Wall had the most potential to equal The Glove defensively. Yow!

Second, Payton said he’d actually been dong some work with Wall. Never quite know what that really means, but it’s a good sign. And I think it’s starting to show in Wall’s game.


The 1TB SSD

ArsTechnica with the overview, and AnandTech with the details on the latest increment on consumer grade price and performance for SSDs. 960 Gb useable is just a bit short of 1 Tb, but give it year. Then we’ll be well over 1 Tb for 30% less cost.

Just in time for my next laptop purchase ;-)

And UltraRAM (PDF warning) is just off the horizon.


Python and vSphere

Link parkin’: Patrick Dunnigan kicks the tires on a number of Python modules that encapsulate the VMWare vSphere API.

I recently had the need to manage VMware vSphere from Python code so I went about looking for examples and open source libraries. The vCenter management server has a SOAP web service that exposes most (if not all) of the administrative capabilities that you can perform on vSphere. At first thought, this seemed like a simple endeavor. However once I got into it I found it not as simple as interacting with a RESTful web service.

Next I went searching for Python client libraries for the vSphere SOAP web service.

Here is a summary of what I found:

Thanks Patrick, ’cuz I’m about to have a close encounter with VMWare’s vSphere. Not looking forward to a reintroduction to SOAP though. Hope these libraries really hide the hurt.


Scalzi on Higher Learning

I swear I was just about to compose a reader request to Jon Scalzi regarding his thoughts on being from an “elite of the elite” university and advice he’d give to his daughter in these days of MOOCs etc. From my thoughts to Scalzi’s keyboard, although I would have been more entertaining than Steve, throwing in lots of Ivy class bonhomie and all that and dropping in a “where fun goes to die” or two:

Presuming my kid has the chops to get in where she wants to go — which I find a reasonable presumption, all things concerned — what I am likely to tell her is this: I’m willing to pay for an elite private institution (think generally but not exclusively the top 25 colleges and the top 25 universities in the US) because their reputations/networks are worth the additional expense in long run. But outside of those schools, why would I pay $40,000+ for a private school when I can pay $10,000 for Ohio State or Ohio University, or only slightly more for Miami University? The value add — the reputation/network — isn’t there in almost all those cases.

I strongly agree with this assessment and keep in mind I’ve been at just about every stage of the pipe: undergrad, grad, faculty, washed up faculty. Haven’t been an educational administrator at any level but I’m not sure I’d wish that on anyone ;-) (Ha! Ha! Only serious)

And we both believe that college can still have a lot of value for a lot of people in this age. I bring this up because there seems to be this virulent dismissive current in the tech and business communities that “higher learning is completely busted, worthless, and mega-disruptive entrepreneurialism is gonna save the day”. I get that there’s a lot of overpriced product out there, but if I see one more link about how a PhD is meaningless, I’m gonna barf. Even if it’s in Critical Literature, just because you can’t reap some huge financial windfall doesn’t mean the result has no value. If there’s a true research oriented dissertation, than at least the ball of humanity’s knowledge has been pushed forward the tiniest bit.


A Week With Feedly

I’m starting to keep an eye out for good GReader diaspora stories. Evan Dashevsky spent a week with Feedly:

To that end, I decided to try my luck with Feedly, a service that has quickly grown to prominence as the go-to replacement for Reader. After a week of test-driving the service in its Web and app versions, I have found it to be a serviceable—and in many ways, superior—replacement for my soon-to-expire Reader.

The one thing I’m looking for is the cross-client synching capabilities.


Emacs Redux Redux

Okay, Emacs Redux has been better than I expected, with lots of little tips that can greatly improve one’s Emacs utility, such as:

I’m fond of seeing somewhere the full path to the file I’m currently editing(as opposed to just the file name displayed in modeline). Emacs’s frame title seems like a good place to display such information, since by default it doesn’t show anything interesting. Here’s how we can achieve this:

and:

Emacs does not have a command backward-kill-line (which would kill the text from the point to the beginning of the line), but it doesn’t really need one anyways. Why so? Simple enough - invoking kill-line with a prefix argument 0 does exactly the same thing!

At least one good nugget per day.


I

The situation.

Disk A. 95% full with Postgres data.

Disk B. 75% full with archival data.

What to do about Disk A?

Move data from disk B to another drive, getting it down to 20% full. Use Postgres’ CREATE TABLESPACE data definition statement to make a new one on Disk B. Run a series of ALTER TABLEs and ALTER INDEXs to safely move half of the Postgres data to Drive B.

Now both drives are at about 60% full and I’m a happy camper with plenty of disk headroom going forward. Bonus, both A and B are SSDs, so with an hour or so of effort, all my data stays fast and my queries might even get a little boost from disk parallelism to boot.


JZ Spoiled By SSD

JZ would be one Jeremy Zawodny, engineer at Craigslist.

But this particular task involves slurping ALL the data out of that cluster and onto a cluster of sharded Sphinx servers so I can re-index the roughly 3 billion documents. That’s all well and good, but since our MongoDB cluster isn’t terribly performance sensitive, it is built on old-fashioned (am I allowed to use that phrase?) spinning disks. And you know what that means, right?

Yeah, seek time matters. A lot.

If this was hitting our production MySQL clusters, I wouldn’t care nearly as much. Those all use one flavor or another of flash stoarge. In fact, we’ve been using SSDs long enough and in enough places that I’m spoiled at this point. I sort of cringe every time I have to deal with disk seeks. That’s so five years ago.

Zawodny does this at real scale for real money So if Jeremy’s spoiled by SSD performance, the rest of us can start just follow the trend.


Artisanal War

In tweets from @GreatDismal

[embed]https://twitter.com/GreatDismal/status/320161070113251328[/embed]

[embed]https://twitter.com/GreatDismal/status/320182239000408064[/embed]

First the digital came for the fighter pilot, then for the sniper. #artisanalwarfighting

Slouching towards Armageddon, you might say.


A MarsEdit Update

MarsEdit 3 Logo Daniel Jalkut has released a new version of MarsEdit, the tool I’m using at this very moment to post with:

MarsEdit 3.5.9 is now available. This is a free update for licensed MarsEdit customers. The update will be submitted to the Mac App Store today and will be available there when Apple approves the update.

This is quite a significant update, in spite of it being entirely composed of “bug fixes.” I’m still working a major update to MarsEdit that will accommodate the Mac App Store’s sandboxing requirements. Until that is ready, I’ll keep fixing bugs in the app but will not be able to add significant features.

I’m grateful to Red Sweater software for picking up and maintaining MarsEdit. It’s been in my toolbox for years and is coming back into prominence now that I’m using Ye Olde MacBook more. If you’ve noticed more original text here, it’s partially because I’m back to using a keyboard and MarsEdit.


Avoiding Facebook: Part N

Where N is quite large. From Sophos’ NakedSecurity blog

Even if you have rejected particular apps from connecting from your Facebook profile, you have no control over what apps your friends and family have chosen to connect to their profiles.

Your friends and family may not be being as cautious as you are about Facebook apps - and you may not realise that when other Facebook users choose to install apps they can then share the information they can see about you with those apps.

Facebook argues that allowing other people to share your info with third-party apps makes the “experience better and more social”. Your opinion may vary from theirs, however.

Yes, my opinion definitely does. I’d normally just look the other way, but Facebook has become increasingly disturbing. He says with a surly frown, looking askance at Quora and other Facebook login miscreants.


Petrel

At work, I’m being overtaken by the Storm. But I’ll be damned if I get sucked back into Java programming without some kicking and screaming. I’ll definitely be giving Petrel a whirl

Tools for writing, submitting, debugging, and monitoring Storm topologies in pure Python.

Petrel offers some important improvements over the storm.py module provided with Storm:

  • Topologies are implemented in 100% Python
  • Petrel’s packaging support automatically sets up a Python virtual environment for your topology and makes it easy to install additional Python packages.
  • petrel.mock” allows testing of single components or single chains of related components.
  • Petrel automatically sets up logging for every spout or bolt and logs a stack trace on unhandled errors.

And just in case you’re looking for an alternative to Storm, don’t forget Yahoo!’s, now Apache’s, S4. While not as fashionable, S4 is definitely comparable.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.