Off to a Start…

Posted on: Tue 15 May 2018

There have been many times in recent memory where I’ve said, “I’m going to read more books.” Subsequently, that’s been followed by abject failure. Given my most recent such declaration, abject failure would be a reasonable prediction.

So far though, three books have been completed in my self-defined time off:

“All the Birds in the Sky,” Charlie Jane Anders
“You Are Now Less Dumb,” David McRaney
“Deep Work: Rules for Focused Success in a Distracted World,” Cal Newport

Quick thoughts on each book.

“All the Birds in the Sky,” wasn’t quite what I expected. It was a little too fantastic for what I wanted at the moment. Enjoyed a lot of the premise, characters, and writing, but I never really connected.

“You Are Now Less Dumb,” is a worthy successor to “You Are Not So Smart,” but the topics are a bit more complex and don’t fit as well into the originator’s format, which relied on tidy, bite sized chunks discussing cognitive biases. Now we get bigger, less tidy hunks. Definitely worth reading but not quite as satisfying as the first book.

I had been subscribed to Cal Newport’s blog and was very sympathetic to his principles without being a practitioner. With time off, I’m now actively pursuing the potential to put his teachings in play. You can get most of the concepts from his blog, but the book provides excellent organization and refinement (not to mention some change in his pocket). Writing more about this deep work, “Deep Work: …,” and how to apply it will make for good blog fodder.

Getting Started with NLP in Python

Posted on: Tue 08 May 2018

I finally got a chance to register for and take Paco Nathan’s online course on NLP for O’Reilly, “Get Started with NLP in Python”. Totally enjoyable, although it was very much “intro” material. Paco has this down cold. However, I found the use of Jupyter Notebooks and JupyterHub to be truly amazing. As far as I can tell, 100+ participants got a tiny, ephemeral, on-demand compute server combined with well structured computational content.

And. Stuff. Just. Worked.

I didn’t see any complaints in the hosted chat or Slack channels about setup. The amount of install and deployment toil eliminated must have been phenomenal. This might be the way of the future.

Kafka and k8s

Posted on: Mon 07 May 2018

So of course as soon as I slag on getting Apache Kafka deployed to a Kubernetes cluster, Confluent announces the Confluent Operator.

However, a Datanami interview with Neha Narkhede pretty much vindicates my concerns:

But like most things in IT, the devil is in the details. “It’s actually not that easy,” says Neha Narkhede, the CTO and co-founder of Confluent, the commercial venture behind open source Apache Kafka. “Kubernetes is amazing, but it was designed for stateless applications.”

Like all stateful applications, Kafka makes certain assumptions about the infrastructure that it’s running on. “If machines come and go, you have to maintain the logical context of what a node is,” Narkhede tells Datanami. “As the underling hardware changes, you need to make sure that that node concept stays the same. In addition to that, there’s a bunch of networking-layer details that need to be right.”

Two big gotchas on this announcement. First, it’s not shipping yet and even early availability won’t happen until later this summer. Second, the Confluent Operator is going into the closed source, proprietary, revenue generating bucket of the Confluent business. I can totally understand this decision, but it’ll probably be a bit of a bummer for those without an enterprise grade checkbook.

Wonder if either a really good open source deployment of Kafka on k8s emerges or this leaves a window open for other streaming platforms (Pulsar, NATS Streaming) to be more k8s friendly and garner wide adoption.

Feedreading Email

Posted on: Thu 03 May 2018

So it turns out that Feedbin, the RSS feed aggregator service that I pay for, has long supported receiving email newsletters and treating the source as a feed:

You can now receive email newsletters in Feedbin.

… To use this feature, go to the settings page and find your secret Feedbin email address. Use this email address whenever you sign up for an email newsletter. Anything sent to it will show up as a feed in Feedbin, grouped by sender.

Reading email in an email app feels like work to me. However, there’s a certain class of email that I want to enjoy reading, and Feedbin is where I go when I want to read for pleasure.

At first, I scoffed at this notion. Recently I checked my “Newsletters” label in Gmail and was gobsmacked by how many old issues, across a number of newsletters, had piled up. What the heck, let’s give this Feedbin hack a whirl.

So far it’s actually way better than I thought. The reading is improved because Gmail cuts off longer messages after a certain number of characters, which modern newsletter emails oft run afoul of. Then you have to follow a link to read the rest which is a pain in the ass. Also, the messages show up in a place that I’m much more prone to check for reading material on a regular basis. Finally, the autogrouping by source works better than a label in my e-mail reader. Sure I could set up Gmail to do that, but hey, work.

The only gotcha, is that for new signups, you have to go into the reader and click the verification link. No big deal, but sorta weird.

Herewith are a few newsletters that I like and have thrown into this scheme:

Stateful Kubernetes Apps

Posted on: Wed 02 May 2018

I haven’t dived too far down the Kubernetes rabbit hole yet, but one thing I was trying to tinker with was deploying Kafka within a k8s cluster. The results were … unsatisfying. The folks at Cockroach Labs have observed similar issues and are offering advice on how to deal with stateful k8s apps.

In short: managing state in Kubernetes is difficult because the system’s dynamism is too chaotic for most databases to handle––especially SQL databases that offer strong consistency.

I’ll note that for Kafka, the odd peccadillos of ZooKeeper configuration make the process “anti-cloud native”. And how to expose long-lived, stateful connections, also seems to work against the Kubernetes grain.

I’m sure someone, somewhere has wrangled through all of these problems but there does seem to be a lot of toil here.

Gainfully Underemployed

Posted on: Tue 01 May 2018

A long time ago I vowed to kill my commute. I was a bit off on predicting the closing date, but it’s looking like that time has really arrived. In spades.

I’ll save the details for another post, but last Friday I out processed at the previous gig. The next position is located between 10 and 15 minutes of driving from my house, with no interstate highways to travel. There’s also a lot of schedule flexibility and plenty of opportunity to work from home.

And to top it all off, I have enough runway and flexibility to not start until after the 4th of July. Looking at a good two months where I completely drive the daily activities, modulo family obligations. Herewith, some things that I plan to at least initiate with the time off and thence establish momentum and habits:

Spend more time with family
Initiate and maintain an exercise routine
Read more books
Blog more
Move this blogging infrastructure to a more modern hosted VPS
Modernize my development skills by well, actually writing code, using an Integrated Development Environment (IDE), and unlimbering my distributed version control (DVCS, read Git) skills
Catch up with the “cloud native” ecosystem including Kubernetes, Docker, NATS, etc.
Really get into Jupyter ecosystem
Reintroduce myself to some old big data friends like Spark, Kafka, AWS, and GCP
…

Definitely ambitious and not exhaustive. Feels like a lot to do, so I better get to it.

Inside Building a Distributed Log

Posted on: Fri 22 December 2017

Tyler Treat is publishing a deep dive series on how to implement distributed log infrastructure:

In this series, we’re not going to spend much time discussing why the log is useful. Jay Kreps has already done the legwork on that with The Log: What every software engineer should know about real-time data’s unifying abstraction. There’s even a book on it. Instead, we will focus on what it takes to build something like this using Kafka and NATS Streaming as case studies of sorts—Kafka because of its ubiquity, NATS Streaming because it’s something with which I have personal experience. We’ll look at a few core components like leader election, data replication, log persistence, and message delivery. Part one of this series starts with the storage mechanics. Along the way, we will also discuss some lessons learned while building NATS Streaming, which is a streaming data layer on top of the NATS messaging system. The intended outcome of this series is threefold: to learn a bit about the internals of a log abstraction, to learn how it can achieve the three goals described above, and to learn some applied distributed systems theory.

Looking forward to more in this sequence.

MarsEdit 4

Posted on: Wed 20 December 2017

First post with the newest version of MarsEdit. Just an initial tire kick, but feels like a great update.

Raspberry k8s Pi Cluster

Posted on: Fri 10 November 2017

I am so doing this homebrew project over the holidays.

First, why would you do this? Why not. It’s awesome. It’s a learning experience. It’s cheaper to get 6 pis than six “real computers.” It’s somewhat portable. While you can certainly quickly and easily build a Kubernetes Cluster in the cloud within your browser using a Cloud Shell, there’s something more visceral about learning it this way, IMHO. Additionally, it’s a non-trivial little bit of power you’ve got here. This is also a great little development cluster for experimenting. I’m very happy with the result.

By the end of this blog post you’ll have not just Hello World but you’ll have Cloud Native Distributed Containerized RESTful microservice based on ARMv7 w/ k8s Hello World! as a service.

NATS Streaming

Posted on: Fri 10 November 2017

Soon I’m going to start a new side project with streaming data. As an Apache Kafka fan, I love the notion of persistent, append only logs with logical record identifiers. Plus the Confluent folks are doing some interesting work with Kafka Streams for writing stream processing code. Unfortunately, setting up a single node Kafka server seems really pointless to me, but my project initially won’t be cost effective in an honest-to-gosh distributed Kafka installation.

So I started looking around to see if there was anything similar.

Turns out Redis will be getting a streams extension, that looks really promising. Probably won’t be here soon enough for my timeline.

Meanwhile, NATS has a streaming server wrapped around the core messaging service that has many of the same semantics as Kafka. NATS and NATS streaming are really lightweight and easy to deploy. The trade is giving up some of the distribution and reliability guarantees that Kafka provides. That’s fine at this point.

Need to do some tire kicking this weekend.

Swarm, Mesos, and Kubernetes

Posted on: Mon 06 November 2017

The podcast episodes from The New Stack tend to be hit or miss for me. There’s quite a bit of information-free content but every now and then you get a good nugget or two.

Such was a recent edition of The New Stack Analysts just after Docker announced support for Kubernetes as an orchestration engine. Dell’s Josh Bernstein gave his 10,000 foot view of the production orchestration expertise in the Docker, Kubernetes, and Mesos communities. I’m paraphrasing (you can listen for yourself), but it felt like elementary school (Docker), high school (Kubernetes), and masters (Mesos).

For those working with large-scale data services, Bernstein explained that Mesos is often the developer tool of choice for big data applications and those running the SMACK stack, emphasizing that Kubernetes simplistic nature has yet to address these issues.

Bernstein’s key point was that the Mesos community has been deploying big data driven production workloads for quite a while now. The elegance and energy of k8s may not be there, but there’s a battle-hardened experience that leads to much deeper technical depth in public presentations, at least as measured by conferences like Mesoscon.

Feels about right. YMMV.

Welcome Back Simon

Posted on: Mon 06 November 2017

Simon Willison up and restarted blogging.

I may not have blogged in seven years, but I did answer a whole bunch of questions on Quora. I’ve imported my Quora answers (extracted using this userscript) and used them to help fill out the intervening years. I plan to do the same with a few other types of content as well.

That’s one of the benefits of leaving your feedroll alone for the most part. Willison posts just started showing up again in my aggregator. And he’s been really productive with useful technical content.

Glad you’re back on the case Simon!

GCP Podcast 100: Vint Cerf

Posted on: Sun 05 November 2017

You should listen to the 100th episode of the Google Cloud Podcast:

Google, the Cloud, or podcasts would not exist without the internet, so it’s with an incredible honor that we celebrate our 100th episode with one of its creators: Vint Cerf.

Even if you don’t know who Vint Cerf is you should listen. It’ll be worth your time.

TuneIn Premium

Posted on: Sun 05 November 2017

I’m here to recommend TuneIn Premium as a worthwhile streaming audio product, especially if you like to listen to sporting events.

A few years ago, I subscribed to NFL GameDay Audio. $10.00 or so for the whole season. Love streaming audio, let’s me do work while following a game. I don’t completely blow big chunks of Sunday, Monday, and Thursday being a couch potato. Yet I can still track my fantasy team. Perfect price, for a solid product.

GameDay Audio was folded into a comprehensive, video-on-demand, $100 per season package. Only live mobile viewing is on Verizon no less. Not for me.

Last year the NBA had an Audio League Pass product. $10 for the entire season, stream any game. A little wonky, wouldn’t background stream while in the app, but totally enjoyed the product last season.

At the start of this NBA season, audio streaming was folded into a comprehensive, video-on-demand product that costs $200 for the year. Not to mention, live national games and live local games are blacked out. Really for the desperate, out of market fanatic. Not at all useful for me.

In an online chat, I asked a customer rep if NBA League Pass Audio was coming back this season. The response was no. So of course now it shows up in the product listings. Jerks.

For some reason, I’ve been ignoring TuneIn Premium’s offerings. This fall I decided to check out the trial offering. Guess what? You get streaming audio access to all of the NBA, NFL, MLB, NHL, the English Premier League, and pretty much any NCAA Division I sporting event. An annual subscription is $99, or about $8 per month. Not quite as cost effective as the old set of passes, but there’s way more content accessible.

To the best of my knowledge, the NHL streams audio for free on their app, MLB streams audio for a reasonable price in a great app, the NBA is highly confused about their audio product, and the NFL just doesn’t care.

So far so good with TuneIn Premium. Like I said, I’m here to recommend TuneIn Premium as a worthwhile streaming audio product, especially if you like to listen to sporting events.

Bitcoin Roots

Posted on: Fri 06 October 2017

I’m getting back into really trying to understand Bitcoin and other blockchain-based ecosystems. I really enjoyed Arvind Narayanan and Jeremy Clark’s deep dive into the many technical and academic threads that culminated in Bitcoin:

If you’ve read about bitcoin in the press and have some familiarity with academic research in the field of cryptography, you might reasonably come away with the following impression: Several decades’ worth of research on digital cash, beginning with David Chaum did not lead to commercial success because it required a centralized, banklike server controlling the system, and no banks wanted to sign on. Along came bitcoin, a radically different proposal for a decentralized cryptocurrency that didn’t need the banks, and digital cash finally succeeded. Its inventor, the mysterious Satoshi Nakamoto, was an academic outsider, and bitcoin bears no resemblance to earlier academic proposals.

This article challenges that view by showing that nearly all of the technical components of bitcoin originated in the academic literature of the 1980s and ‘90s … Bitcoin’s intellectual history also serves as a case study demonstrating the relationships among academia, outside researchers, and practitioners, and offers lessons on how these groups can benefit from one another.

New to me but definitely worth a read.

Controlled Social Media Usage

Posted on: Mon 02 October 2017

Cal Newport generally talks about making a great career, but today’s missive is applicable to modern mental and emotional health:

With this in mind, I’m going to stop short of asking you (yet again — I was chagrined to recently learn that I’m the top two results when you google “Quit Social Media”) to consider leaving these services altogether. Instead, let me make a suggestion that the social media industrial complex fears far more: change your relationship with these services to shift from compulsive to controlled use.

Still use social media, if you must: but on a schedule; just a handful of times a week; preferably on a desktop to laptop, which tames the most devastatingly effective psychological exploitations baked into the phone apps.

I will definitely endeavor to follow this advice.

Pulsar Messaging Concepts

Posted on: Fri 29 September 2017

I still like to play like a messaging guy on occasion. Today I was digging around the documentation for Apache Pulsar which is the underlying messaging framework for the Streamlio product.

Pulsar’s key features include:

Native support for multiple clusters in a Pulsar instance, with seamless geo-replication of messages across clusters

Very low publish and end-to-end latency

Seamless scalability out to over a million topics

A simple client API with bindings for Java, Python, and C++

Multiple subscription modes for topics (exclusive, shared, and failover)

Guaranteed message delivery with persistent message storage provided by Apache BookKeeper

Pulsar’s underlying, fundamental concepts are quite interesting, compared to nsq, Kafka, rabbitmq et. al. The clustering and brokering seem a bit complex but Pulsar is from folks who had some really hard problems. The notion of failover subscriptions is an interesting twist though. Might be worth taking Pulsar for test drive in a few containers.

Mesoscon 2017

Posted on: Thu 28 September 2017

Link parkin’: Mesoscon North America 2017

Really need to check out Ben Hindman’s “The State of the Mesos”

Open Source Vespa

Posted on: Wed 27 September 2017

At first I was set to be disappointed when hearing of a new, open source, big data platform named Vespa. Then I read this graf from the team’s announcement blog post:

Serving often involves more than looking up items by ID or computing a few numbers from a model. Many applications need to compute over large datasets at serving time. Two well-known examples are search and recommendation. To deliver a search result or a list of recommended articles to a user, you need to find all the items matching the query, determine how good each item is for the particular request using a relevance/recommendation model, organize the matches to remove duplicates, add navigation aids, and then return a response to the user. As these computations depend on features of the request, such as the user’s query or interests, it won’t do to compute the result upfront. It must be done at serving time, and since a user is waiting, it has to be done fast. Combining speedy completion of the aforementioned operations with the ability to perform them over large amounts of data requires a lot of infrastructure – distributed algorithms, data distribution and management, efficient data structures and memory management, and more. This is what Vespa provides in a neatly-packaged and easy to use engine.

Check out the blog post. It also details how Vespa actually impacted the bottom line, in a big way, of a number of Oath (neé Yahoo!) properties.

Kubernetes Everywhere

Posted on: Tue 26 September 2017

Rancher got in early on the Docker trend and offers a container management platform. Today they announced release 2.0 of their product. I was somewhat intrigued to see their full on embrace of Kubernetes.

In early 2016 I met Joe Beda, who founded the Kubernetes project at Google and would later found the Kubernetes company Heptio. Joe painted a vision of “Kubernetes Everywhere,” where Kubernetes can potentially rival the ubiquity of IaaS.

The popularity of Kubernetes continues to rise in 2017. Its momentum is not slowing. We have little doubt in the not so distant future, Kubernetes-as-a-Service will be available from all infrastructure providers. When that happens, Kubernetes will become the universal infrastructure standard. DevOps team will no longer need to operate Kubernetes clusters themselves. The only remaining challenge will be how to manage and utilize Kubernetes clusters available from everywhere.

This is a notable, to me, trend with other companies in the space, like Mesosphere, getting on board with k8s. About a a year and a half ago, a project team I was on argued for k8s to replace a homegrown container solution. The choice wasn’t so obvious at the time, and organizational inertia eventually scotched the whole notion, but I wonder how better off the project would have been.

In any event, Kubernetes will be a good resume bullet for the foreseeable future. Salt liberally with skepticism, alá Derek Collison of Apcera.

Vertically Integrated Deep Learning Stack

Posted on: Mon 25 September 2017

Really enjoyed a recent Architecht Show podcast interview about co-design of hardware and software to improve task performance. Feels like computing is getting close to “software defined everything,” which provides lots of opportunity to build optimized cross-layer stacks.

In this episode of the ARCHITECHT AI Show, Hillery Hunter—IBM Fellow and director of the Accelerated Cognitive Infrastructure group at IBM Research—speaks about the state of the art in deep learning systems design. She discusses some record-breaking results that IBM recently achieved in distributed deep learning; ideal use cases for state-of-the-art image recognition; and the pros, cons and advancements in everything from GPUs to cloud-specific hardware such as Google’s TPUs.

Hillery Hunter comes through as a really sharp cookie, while cleanly and clearly expressing the impact of her team’s results.

It Takes Two

Posted on: Fri 01 September 2017

“C’mon ya’ll, grab a hold, get off the wall…”

OEmbed Link rot on URL: https://youtu.be/CWQaTAocDyA

Definitely of a particular era, but I always loved this specific mix of Seduction’s, Two to Make it Right. Not quite House, not really good hip-hop, but worked like hell in the club. Michelle Visage’s attitude and that C&C Music Factory sensibility holds up halfway decently.

Fully Operational

Posted on: Fri 01 September 2017

Peyote Substrate

Wasn’t all that difficult, but I got my peyote substrate sketch working again. The only interesting thing I discovered is that the Python cairocffi module doesn’t work particularly well with pygame surfarrays. cairocffi has this weird bit where it stashes data buffers in an internal cache to help finalize external data garbage collection. Unfortunately, this conflicts with pygame’s locking of Surfaces before blitting to the screen. Switching back to pycairo was the resolution.

Anyhoo, debugging the issue forced a reintroduction with peyote. The codebase is surprisingly although a bit hackish. It’ll be fun to clean it up, modernize it, and generate some new sketches.

Never Fear Ya’ll

Posted on: Thu 31 August 2017

Just taking a little breather. Planning to come back strong in September.

Hobo Scratchin’

Posted on: Fri 25 August 2017

What we’re gonna do is go way back. Back into time.

Classic proto-Hip Hop via Malcolm McClaren.

We’re on a world tour with Mr. Malcom McLaren. Goin’ each and every place including Spain…

Good Thoughts on Distributed Computing

Posted on: Fri 25 August 2017

Ken Birman is a giant of Systems research.

I’ve really worked in Cloud Computing for most of my career, although it obviously wasn’t called cloud computing in the early days. As a result, our papers in this area date back to 1985. Some examples of mission-critical systems on which my software was used in the past include the New York Stock Exchange and Swiss Exchange, the French Air Traffic Control system, the AEGIS warship and a wide range of applications in settings like factory process control and telephony. In fact, every stock quote or trade on the NYSE from 1995 until early 2006 was reported to the overhead trading consoles through software I personally implemented - a cool (but also scary) image, for me at least! During the ten years this system was running, many computers crashed during the trading day, and many network problems have occurred - but the design we developed and implemented has managed to reconfigure itself automatically and kept the overall system up, without exception. They didn’t have a single trading disruption during the entire period. As far as I know, the other organizations listed above have similar stories to report.

So what mission critical system has your work ended up in?

Don’t know why I happened to be trawling his Cornell website recently, but it turns out he’s been publishing a series of essays on the Web. First off, Birman writes really well. This is advanced technical material but fairly accessible. Second, if he says RDMA is a big deal, I’ll just get on the bandwagon and buckle my seatbelt. Actually, the more of his essays I read the more I’m convinced of his position. If the trajectory is right, HPC grade network interconnects will be commoditized and made accessible to average programmers. Sort of like what Hadoop did for Map/Reduce.

If you are at all interested in Systems research, I strongly encourage you to work your way through Ken Birman’s “…Thoughts on Distributed Computing.”

On Kafka Clients

Posted on: Thu 24 August 2017

Link parkin’: A Tale of Two Kafka Clients

We use and love Kafka at Data Syndrome. It enables us to move processing from batch to realtime with minimal pain and complexity. However, during a recent project we learned a hard lesson about the kafka-python package that has me thinking about how to make the choice between open source tools. In this post we reflect on the open source decision making process. We describe two Kafka clients for Python, the issues we encountered, and the solution we’ll be using going forward.

Slightly disappointed that there wasn’t mention of pykafka, which is not Confluent controlled but still pretty robust.

Well I’ll Be…

Posted on: Mon 21 August 2017

Over seven years ago, I declared victory on building a processing style framework in Python. Named it peyote. I even created a GitHub repository for peyote.

Damn if 2 out of 3 example sketches I wrote for peyote don’t still work in Python 2.7 using current versions of modules such as pygame and pycairo. One sketch seems to have suffered from bitrot relative to the numpy module. This seems relatively surmountable though.

Well if this code can survive seven years of neglect, it probably deserves for me to resurrect it with some modern Python tooling and my improved development skills. Maybe even throw in some Python 3 compatibility.

Viva El Peyote!

Crazy Postin’

Posted on: Sat 19 August 2017

Once upon a time, I posted 80 times in one month. Then followed it up with three straight months of 50 or more posts. Yowza!

Different era though. Link blogging was more in fashion and of course I had way more discretionary time. Plus, the task actually did support getting insights related to my research agenda of the moment.

Link blogging is sort of fun though. Between faving tweets and starring feed items, feels like I should be able to mash something up into my own hosted Tumblr style blog.

The Promised Land

Posted on: Fri 18 August 2017

Somebody burn the tape of these last seven days.

OEmbed Link rot on URL: https://youtu.be/nM7_Ip2zooc

Classic Chicago House music from Joe Smooth Inc., The Promised Land

Stonebraker Musings

Posted on: Thu 17 August 2017

Once upon a time, I had the pleasure of taking a graduate database course with ACM Turing Award Winner, Professor Michael Stonebraker. It was a hoot because Stonebraker was a character in a good way. He has strong opinions but expresses them with a very dry, non-confrontational, wit. Said approach exemplified by this interview, “How Hardware Drives The Shape Of Databases To Come,” at The Next Platform.

The issue is the changing storage hierarchy and what it has to do with databases. Let’s start with online transaction processing. In my opinion, this is a main memory system right now, and there are a bunch of NewSQL startups that are addressing this market. An OLTP database that is 1 TB in size is a really big one, and 1 TB of main memory is no big deal any more. So I think OLTP will entirely go to main memory for anybody who cares about performance. If you don’t care about performance, then run the database on your wristwatch or whatever.

Definitely give it a read, especially the thoughts on how accelerating innovation in the networking space will affect data management. Go Bears!

Launching Streamlio

Posted on: Wed 16 August 2017

Link parkin’: “Announcing Streamlio”

We are excited to announce Streamlio, the world’s leading enterprise-grade, unified, end-to-end real-time solution. Founded by co-creators of best-of-breed open-source technologies proven at Twitter and Yahoo, Streamlio unifies Apache Pulsar (incubating), Heron, and Apache BookKeeper to deliver the next generation of real-time applications. Enterprises across all industries now need to adopt event-driven architecture, characterized by a new era of real-time action with no humans in the loop. That’s why we have founded Streamlio. Our mission is to deliver an easy-to-use, enterprise-grade, real-time solution to the masses.

As Ben Lorica points out, this looks like a play on vertical integration of key technologies for stream processing, machine learning, and I/O. TIL about Apache BookKeeper.

.@streamlio ” … unifies “STREAM” processing, 'ML' for Machine Learning, and 'IO' for messaging and storage” https://t.co/XfeOi8BPCD
— Ben Lorica 罗瑞卡 (@bigdata) August 16, 2017

Gettin’ a little crowded in the production ready, stream processing platform/toolkit space.

Unik, A Unikernel Platform

Posted on: Sun 13 August 2017

Link parkin’. UniK: The Unikernel Compilation and Deployment Platform

From an introductory blog post on Unik:

UniK (pronounced you-neek) is a tool for compiling application sources into unikernels — lightweight bootable disk images — rather than binaries. UniK runs and manages instances of compiled images across a variety of cloud providers as well as locally on Virtualbox. UniK utilizes a simple docker-like command line interface, making building unikernels as easy as building containers. UniK is built to be easily extensible, allowing - and encouraging - adding support for unikernel compilers and cloud providers.

Integrates with most of the major orchestration engines, deploys to multiple unikernel types (rump, OSv, IncludeOS, MirageOS) and provides polyglot language support including Javascript (Node.js), Go, Java, C/C++ and Python3.

Learning New Tech

Posted on: Sat 12 August 2017

I found Julia Evans’ take on learning new tech interesting. The short story is, do it on the job!

Right now at work I’m working a bit in Go! That is interesting and I’m happy to be doing it. But it is not so fun that I feel like spending a lot of my personal time on it. And I don’t really think it’s necessary, I learn languages by writing them, reading other people’s code to learn about conventions, and having my code reviewed. I can just do all of those things at work!

To be clear, I don’t think it’s bad to learn programming languages outside of work. I just don’t really do it.

The real key is understanding that conducting mindful, motivated, learning is actually a responsible act that provides value to the business. Sure, boning up on skills as a form of moonlighting for your next startup gig is bad form. But taking reasonable time out to read up on and internalize technologies one doesn’t have depth on is perfectly reasonable. At least in any organization worth it’s salt.

My current gig is at the moment diametrically opposed to hers. I don’t have any dev tools on my work laptop, much less do any development. So it’s challenging to learn on the job. And as opportunity allows, I’m quite happy to learn on my off time. Even so, with the plethora of vibrant open source communities out there, there’s plenty of places to follow her advice on how to learn.

Blogaversary, Workaversary

Posted on: Fri 11 August 2017

Big milestone today on Mass Programming Resistance. According to my WordPress console, this post is number 1500. Also, this blog’s first post was on August 11th, 2008, making today the close of 9 blogging years on MPR. Not to mention 52 straight days of posting.

In addition, this is also the anniversary of my starting employment with Schafer Corporation. That name is pretty mundane, and going away after acquisition, but the job is anything but. Being a Scientific, Engineering, and Technical Advisor (SETA) for the Defense Advanced Research Projects Agency (that DARPA) gives one a solid peek behind the curtain of a high impact, technology innovation organization. Can’t really give much detail but it’s been a good year.

What better punctuation than The Black Science Orchestra’s, New Jersey Deep.

Yeaaaah! Black Science got it goin’ on.

Squashing Commits

Posted on: Thu 10 August 2017

Link parkin’. Oldie, but goodie, on taking a bunch of git commits and mashing them into one:

But one thing occasionally bothers me, and that’s pull requests that come loaded with several temporary commits. I often find myself asking contributors to squash those into a single descriptive commit. Not everyone knows how to do this effectively, hence this quick writeup.

I can also attest that magit makes squashing commits, from within the Emacs editor, quite agreeable.

Container-Based Network Emulation

Posted on: Wed 09 August 2017

As is my wont, I sometimes prowl around old proceedings of systems conferences. Today I landed on the program for ACM CoNEXT 2012, with a paper interestingly titled, “Reproducible Network Experiments Using Container-Based Emulation” (PDF)

In an ideal world, all research papers would be runnable: simply click to replicate all results, using the same setup as the authors. One approach to enable runnable network systems papers is Container-Based Emulation (CBE), where an environment of virtual hosts, switches, and links runs on a modern multicore server, using real application and kernel code with software-emulated network elements. CBE combines many of the best features of software simulators and hardware testbeds, but its performance fidelity is unproven.

In this paper, we put CBE to the test, using our prototype, Mininet-HiFi, to reproduce key results from published network experiments such as DCTCP, Hedera, and router buffer sizing. We report lessons learned from a graduate networking class at Stanford, where 37 students used our platform to replicate 16 published results of their own choosing. Our experiences suggest that CBE makes research results easier to reproduce and build upon.

Obviously containers, even Linux containers, are fairly old, but I’m just amused that an impactful research paper involving containers was published 5 years ago, with the work essentially done 10 months before the uber-hyped Docker was open sourced. The future is already here, just unevenly distributed and all that jazz.

Ditto for all the computational, reproducible, publication hoohah around web-based scientific notebooks, such as Jupyter.

Bonus entertainment! Check out the accompanying presentation slides (PDF), especially slide 60. I always tell people, at some point in the process getting a PhD really sucks. No exceptions!

Practice Makes Perfect

Posted on: Tue 08 August 2017

Link parkin’. Practice Your Python

No matter where you are in your Python journey, it can always help to practice and refine your skills. This site lists the best resources I’ve found for practicing Python — through exercises and challenges.

If you have found a good resource that helps people to practice and improve their Python skills, send me e-mail at reuven@lerner.co.il, and I’ll add it to this list.

Need to also mention Reuven M. Lerner’s Practice Makes Python

Lord knows I need to put in some practice time.

An API of Fire and Ice

Posted on: Mon 07 August 2017

I have totally missed the boat on Game of Thrones, having watched only bits and pieces of episodes here and there. Maybe one day I’ll have the opportunity to binge my way through, but not in the foreseeable future.

But this API project to provide access to data related to the HBO series and the original George R. R. Martin books is major cool!!

An API of Ice And Fire is the world’s greatest source for quantified and structured data from the universe of Ice and Fire (and the HBO series Game of Thrones). We give you access to data about all the Books, Characters and Houses in an easy to use JSON format. …

Why did you build this?

I decided to build this API because I like to build stuff that other developers can use. A Song of Ice And Fire and Game of Thrones is also a great universe and I find it really interesting.

Besides, this was (and is) a great opportunity to try out new technologies.

This is exactly the type of project I’d like to take on with the Discogs.com Data Dumps. There seem to be lots of fun and interesting developer challenges embedded in to what seems like a straightforward task. Data cleaning, schema design, API design, app design, access control, performance, scalability, etc. etc. I just need to get off my ass and get to it.

nbtransom

Posted on: Sun 06 August 2017

Keeping with the Jupyter theme, you can use the nbtransom library to manipulate notebooks with code:

This is a Python 3 library to read/write cells programmatically in Jupyter notebooks which anticipates upcoming collaborative features in Jupyter.

We use this at O’Reilly Media for notebooks used to manage machine learning pipelines. That is to say, machines and people collaborate on documents, implementing a “human-in-the-loop” design pattern: …

nbtransom seems to be a key element for this upcoming JupyterCon talk from Paco Nathan:

Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop (and shares the code used). Jupyter gets used in two ways. First, people responsible for managing ML pipelines use notebooks to set the necessary hyperparameters. In that sense, the notebooks serve in place of configuration scripts. Second, the ML pipelines update those notebooks with telemetry, summary analytics, etc., in lieu of merely sending that data out to log files. Analysis is kept contextualized, making it simple for a person to review. This process enhances the feedback loop between people and machines: humans-in-the-loop use Jupyter notebooks to inspect ML pipelines remotely, adjusting them at any point and inserting additional analysis, data visualization, plus their notes into the notebooks; the machine component is mostly automated but available interactively for troubleshooting and adjustment.

I’ll have to wait until the talk gets released to Safari, but I’m interested to see if there’s discussion of how notebooks are organized at a higher level to support these types of manipulations.