home ¦ Archives ¦ Atom ¦ RSS

Launching Streamlio

Link parkin’: “Announcing Streamlio”

We are excited to announce Streamlio, the world’s leading enterprise-grade, unified, end-to-end real-time solution. Founded by co-creators of best-of-breed open-source technologies proven at Twitter and Yahoo, Streamlio unifies Apache Pulsar (incubating), Heron, and Apache BookKeeper to deliver the next generation of real-time applications. Enterprises across all industries now need to adopt event-driven architecture, characterized by a new era of real-time action with no humans in the loop. That’s why we have founded Streamlio. Our mission is to deliver an easy-to-use, enterprise-grade, real-time solution to the masses.

As Ben Lorica points out, this looks like a play on vertical integration of key technologies for stream processing, machine learning, and I/O. TIL about Apache BookKeeper.

[embed]https://twitter.com/bigdata/status/897879715393839104[/embed]

Gettin’ a little crowded in the production ready, stream processing platform/toolkit space.


Unik, A Unikernel Platform

Link parkin’. UniK: The Unikernel Compilation and Deployment Platform

From an introductory blog post on Unik:

UniK (pronounced you-neek) is a tool for compiling application sources into unikernels — lightweight bootable disk images — rather than binaries. UniK runs and manages instances of compiled images across a variety of cloud providers as well as locally on Virtualbox. UniK utilizes a simple docker-like command line interface, making building unikernels as easy as building containers. UniK is built to be easily extensible, allowing - and encouraging - adding support for unikernel compilers and cloud providers.

Integrates with most of the major orchestration engines, deploys to multiple unikernel types (rump, OSv, IncludeOS, MirageOS) and provides polyglot language support including Javascript (Node.js), Go, Java, C/C++ and Python3.


Learning New Tech

I found Julia Evans’ take on learning new tech interesting. The short story is, do it on the job!

Right now at work I’m working a bit in Go! That is interesting and I’m happy to be doing it. But it is not so fun that I feel like spending a lot of my personal time on it. And I don’t really think it’s necessary, I learn languages by writing them, reading other people’s code to learn about conventions, and having my code reviewed. I can just do all of those things at work!

To be clear, I don’t think it’s bad to learn programming languages outside of work. I just don’t really do it.

The real key is understanding that conducting mindful, motivated, learning is actually a responsible act that provides value to the business. Sure, boning up on skills as a form of moonlighting for your next startup gig is bad form. But taking reasonable time out to read up on and internalize technologies one doesn’t have depth on is perfectly reasonable. At least in any organization worth it’s salt.

My current gig is at the moment diametrically opposed to hers. I don’t have any dev tools on my work laptop, much less do any development. So it’s challenging to learn on the job. And as opportunity allows, I’m quite happy to learn on my off time. Even so, with the plethora of vibrant open source communities out there, there’s plenty of places to follow her advice on how to learn.


Blogaversary, Workaversary

Big milestone today on Mass Programming Resistance. According to my WordPress console, this post is number 1500. Also, this blog’s first post was on August 11th, 2008, making today the close of 9 blogging years on MPR. Not to mention 52 straight days of posting.

In addition, this is also the anniversary of my starting employment with Schafer Corporation. That name is pretty mundane, and going away after acquisition, but the job is anything but. Being a Scientific, Engineering, and Technical Advisor (SETA) for the Defense Advanced Research Projects Agency (that DARPA) gives one a solid peek behind the curtain of a high impact, technology innovation organization. Can’t really give much detail but it’s been a good year.

What better punctuation than The Black Science Orchestra’s, New Jersey Deep.

[embed]https://youtu.be/5fZcZmlasjk[/embed]

Yeaaaah! Black Science got it goin’ on.


Squashing Commits

Link parkin’. Oldie, but goodie, on taking a bunch of git commits and mashing them into one:

But one thing occasionally bothers me, and that’s pull requests that come loaded with several temporary commits. I often find myself asking contributors to squash those into a single descriptive commit. Not everyone knows how to do this effectively, hence this quick writeup.

I can also attest that magit makes squashing commits, from within the Emacs editor, quite agreeable.


Container-Based Network Emulation

As is my wont, I sometimes prowl around old proceedings of systems conferences. Today I landed on the program for ACM CoNEXT 2012, with a paper interestingly titled, “Reproducible Network Experiments Using Container-Based Emulation” (PDF)

In an ideal world, all research papers would be runnable: simply click to replicate all results, using the same setup as the authors. One approach to enable runnable network systems papers is Container-Based Emulation (CBE), where an environment of virtual hosts, switches, and links runs on a modern multicore server, using real application and kernel code with software-emulated network elements. CBE combines many of the best features of software simulators and hardware testbeds, but its performance fidelity is unproven.

In this paper, we put CBE to the test, using our prototype, Mininet-HiFi, to reproduce key results from published network experiments such as DCTCP, Hedera, and router buffer sizing. We report lessons learned from a graduate networking class at Stanford, where 37 students used our platform to replicate 16 published results of their own choosing. Our experiences suggest that CBE makes research results easier to reproduce and build upon.

Obviously containers, even Linux containers, are fairly old, but I’m just amused that an impactful research paper involving containers was published 5 years ago, with the work essentially done 10 months before the uber-hyped Docker was open sourced. The future is already here, just unevenly distributed and all that jazz.

Ditto for all the computational, reproducible, publication hoohah around web-based scientific notebooks, such as Jupyter.

Bonus entertainment! Check out the accompanying presentation slides (PDF), especially slide 60. I always tell people, at some point in the process getting a PhD really sucks. No exceptions!


Practice Makes Perfect

Link parkin’. Practice Your Python

No matter where you are in your Python journey, it can always help to practice and refine your skills. This site lists the best resources I’ve found for practicing Python — through exercises and challenges.

If you have found a good resource that helps people to practice and improve their Python skills, send me e-mail at reuven@lerner.co.il, and I’ll add it to this list.

Need to also mention Reuven M. Lerner’s Practice Makes Python

Lord knows I need to put in some practice time.


An API of Fire and Ice

I have totally missed the boat on Game of Thrones, having watched only bits and pieces of episodes here and there. Maybe one day I’ll have the opportunity to binge my way through, but not in the foreseeable future.

But this API project to provide access to data related to the HBO series and the original George R. R. Martin books is major cool!!

An API of Ice And Fire is the world’s greatest source for quantified and structured data from the universe of Ice and Fire (and the HBO series Game of Thrones). We give you access to data about all the Books, Characters and Houses in an easy to use JSON format. …

Why did you build this?

I decided to build this API because I like to build stuff that other developers can use. A Song of Ice And Fire and Game of Thrones is also a great universe and I find it really interesting.

Besides, this was (and is) a great opportunity to try out new technologies.

This is exactly the type of project I’d like to take on with the Discogs.com Data Dumps. There seem to be lots of fun and interesting developer challenges embedded in to what seems like a straightforward task. Data cleaning, schema design, API design, app design, access control, performance, scalability, etc. etc. I just need to get off my ass and get to it.


nbtransom

Keeping with the Jupyter theme, you can use the nbtransom library to manipulate notebooks with code:

This is a Python 3 library to read/write cells programmatically in Jupyter notebooks which anticipates upcoming collaborative features in Jupyter.

We use this at O’Reilly Media for notebooks used to manage machine learning pipelines. That is to say, machines and people collaborate on documents, implementing a “human-in-the-loop” design pattern: …

nbtransom seems to be a key element for this upcoming JupyterCon talk from Paco Nathan:

Paco Nathan reviews use cases where Jupyter provides a frontend to AI as the means for keeping humans in the loop (and shares the code used). Jupyter gets used in two ways. First, people responsible for managing ML pipelines use notebooks to set the necessary hyperparameters. In that sense, the notebooks serve in place of configuration scripts. Second, the ML pipelines update those notebooks with telemetry, summary analytics, etc., in lieu of merely sending that data out to log files. Analysis is kept contextualized, making it simple for a person to review. This process enhances the feedback loop between people and machines: humans-in-the-loop use Jupyter notebooks to inspect ML pipelines remotely, adjusting them at any point and inserting additional analysis, data visualization, plus their notes into the notebooks; the machine component is mostly automated but available interactively for troubleshooting and adjustment.

I’ll have to wait until the talk gets released to Safari, but I’m interested to see if there’s discussion of how notebooks are organized at a higher level to support these types of manipulations.


GCloud JupyterHub

Link parkin’. The Jupyter team has been making it easier to deploy Jupiter notebook servers in cloud environments. Andrew Odewahn provides some helpful notes.

JupyterHub, a “multi-user server for Jupyter Notebooks,” is an essential tool for teaching and training at scale with Jupyter. …

Their new project, a Helm package for JupyterHub and an accompanying article called Zero to JupyterHub on how to use it, describes the relatively straightforward steps needed to install and run JupyterHub on Google cloud.

In this article, I’ve followed along with the tutorial, adding additional detail on setting up gcloud, preparing a docker image with the content project you want to deploy in it, and provided more background on some of the tools used.

Not trivial at this moment, but expect the edges to get smoothed out over time.


Here It Goes

“…meaning knowledge. Here it goes.”

[embed]https://youtu.be/yspU9tQ8b38[/embed]

In a downtempo, chill mode this Friday. Tosca’s, Rondo Acapricio fits the bill.


TIL GET PUT POST

Over at HackerNews, Today I Learned about a potentially interesting newsletter on API development GET PUT POST.

Welcome to GET PUT POST, a newsletter all about APIs. Each edition features an interview with a startup about their API and ideas for developers to build on their platform.

Not to mention an interesting post on OpenCage, a tiny business providing a geocoding API.


Free Classic Posters

Civilization Calls I have a fascination with image collections, which are great fodder for generative art and wallpaper / screensavers. Shoutout Minneapolis College of Art and Design for making a collection of classic lithograph posters available as a complete Zip collection (via Flickr).

Today, more and more cultural institutions have started to offer free art resources to the public through online archives. From high-quality reproductions of antique photographs to museum-approved art books, these complimentary materials promote and exhibit the endless possibilities of Internet learning. Recently, the Minneapolis College of Art and Design (MCAD) has opted to contribute to this downloadable, digitized phenomenon with Art of the Poster 1880-1918, a curated collection of posters from the “Golden Age” of graphic design.

Featuring over 200 printed works, Art of the Poster 1880-1918 presents a look at lithography’s rise in popularity during La Belle Époque. It was during this time that artists like Alphonse Mucha, Jules Chéret, and Henri de Toulouse-Lautrec popularized the art form, which gained public prominence thanks to new methods of production.

The only downer is that the licensing seems a little inconsistent, although most I’ve looked at are Creative Commons 2.0, which is pretty permissive.


51 for 31

What if I told you a blog author could rise from the dead and become a better poster than ever?

Not to be overly dramatic but the month of July 2017 was my most prolific posting month on Mass Programming Resistance. I logged 51 posts in 31 days besting the previous high of 50 in the month of March 2013. 51 posts is more than my total output for the calendar year 2016, the total output for the calendar year 2015, and the total output for the calendar year 2014. In June and July of 2017 there were a total of 84 entries, more than the combined total for the years of 2016, 2015, and 2014.

I’m not going to declare it my best month though. There were a lot of link outs to interesting technical content, but I’m still leaning a bit heavily on pull quote style posts. A next goal should be to expand these out with some more technical thinking or more work to weave themes together. Not quite sure how I feel about the YouTube and Twitter embeds as convenient as they are. On the positive side, July 2017 featured no sports related content. None of my old style, consumerist, “what piece of gadget junk am I going to buy,” threads either.

Looking at past content, the best eras of this here site typically involved some sort of side software project where I was narrating the development effort. A focus of August will be getting some sort of side hacking going to write about. Also, as a former New Media Hack, I’d like to get back into ocasional media+technology commentary, which is tricky since that seems to inevitably get sucked into the political sphere. Not that I don’t have serious political interests and thoughts, but I’m a firm believer that in the current environment leaving that engagement offline preserves mental health, personal safety, and positive energy.

One thing that has taken a big hit recently is my book reading page count. Consistent book reading also generates a lot of great posting fodder. Need to get back in the groove on that front.

Anyway, too much meta is murder, but I feel reenergized and looking forward to a good, long continued run.


Kleppmann’s “Designing Data-Intensive Applications”

Designing Data Intensive Applications Cover Continuing a theme, I’ve also mentioned Martin Kleppmann previously. He recently completed his O’Reilly book, “Designing Data-Intensive Applications.”

As software engineers, we need to build applications that are reliable, scalable and maintainable in the long run. We need to understand the range of available tools and their trade-offs. For that, we have to dig deeper than buzzwords.

This book will help you navigate the diverse and fast-changing landscape of technologies for storing and processing data. We compare a broad variety of tools and approaches, so that you can see the strengths and weaknesses of each, and decide what’s best for your application.

I read the first couple of chapters from the early release edition, but haven’t been tracking the book to its completion. Based upon the quality of his blog posts (really enjoyed the post on techniques for real-time text search), I’m looking forward to catching up with the final product.

Also interesting to see Kleppmann joined the University of Cambridge as a Research Associate, working on the TRVE DATA project.


1.1 Billion Taxi Rides on BrytlytDB

I’ve mentioned before the fine work that Mark Litwintschik does putting data management systems through their paces using a dataset of 1.1 billion taxi rides. He’s back with another post on BrytlytDB.

BrytlytDB is an in-GPU-memory database built on top of PostgreSQL. It’s operated using many of PostgreSQL’s command line utilities, it’s wire protocol compatible so third-party PostgreSQL clients can connect to BrytlytDB and queries are even parsed, planned and optimised by PostgreSQL’s regular codebase before the execution plan is passed off to GPU-optimised portions of code BrytlytDB offer.

There have been quite a few posts by Litwintschik since I noted his efforts. What caught my eye this time is the mention of the, new to me, BrytlytDB. BrytlytDB apparently leverages a lot of the core capabilities of the PostgreSQL code base and presents a lot of API compatibility. To quote from the homepage, “Brytlyt combines the power of GPUs with patent pending IP and integrates with PostgreSQL.”

I probably have a bit of myopia, but it feels like PostgreSQL essentially defines the baseline for commercial DBMS functionality these days.

And once again, I have to commend Litwintschik on the thoroughness of his reporting on these posts. One of the few technical bloggers who provides enough detail to actually approach “reproducibility.”


RISELab Podcast

Derrick Harris had a recent interview that provided a nice overview of the RISELab and what’s going on at Databricks.

In this episode of the ARCHITECHT Show, In this episode of the ARCHITECHT Show, Ion Stoica talks about the promise of real-time data and machine learning he’s pursuing with the new RISELab project he directs at UC-Berkeley, along with some other big names in big data. Stoica previously was director of the university’s AMPLab, which created and helped to mature technologies such as Apache Spark, Apache Mesos and Alluxio. Stoica is also co-founder and executive chairman of Apache Spark startup Databricks, and he shares some insights into that company’s business and the evolution of the big data ecosystem.

Go Bears!


More Linux Tracing Fun

I find deep dives into eBPF and Linux tracing fascinating. Jean-Tiare Le Bigot has a basic eBPF intro and then an interesting application to single host, L2 level, network packet tracing.

eBPF/bcc enables us to write a new range of tools to deeply troubleshoot, trace and track issues in places previously unreachable without patching the kernel. Tracepoints are also quite handy as they give a good hint on interesting places, removing the need to tediously read the kernel code and can be placed in portions of the code that would otherwise be unreachable from kprobes, like inline or static functions.

Also, I learned about the mtr utility.


Understanding Message Brokers

Link parkin’. A free(ish) e-book comparing and contrasting the current leading frameworks for messaging. Free as in “give us contact info first” free. Haven’t read yet, YMMV.

Author and consultant Jakub Korab describes use cases and design choices that lead developers to very different approaches for developing message-based systems. You’ll come away with a high-level understanding of both ActiveMQ and Kafka, including how they should and should not be used, how they handle concerns such as throughput and high-availability, and what to look out for when considering other messaging technologies in future.

I’ll probably grab it out of message nerd curiosity. Also wondering if the book touches on somewhat divergent frameworks like NATS.


Exactly Once Processing

I promised to revisit the topic of Kafka’s new “exactly once processing.” A while ago, Tyler Treat generated a relatively popular post entitled “You Cannot Have Exactly Once Delivery”. Treat came back and recontextualized the original argument in the face of Confluent’s recent work.

First, let me say what Confluent has accomplished with Kafka is an impressive achievement and one worth celebrating. They made a monumental effort to implement these semantics, and it paid off. The intention of this post is not to minimize any of that work but to try to clarify a few key points and hopefully cut down on some of the misinformation and noise.

The gist is that the Kafka Streams approach is a fairly closed framework that works with the messaging system to ensure a particular semantics correctly with reasonable performance. That’s a good thing. Definitely worth a read if you’re a messaging junkie.


Wishing On A Star

Happy Friday! Straight up Masters at Work ’90s remix classic. The Cover Girls, Wishing On a Star (Magic Sessions Vocal Dub)

[embed]https://youtu.be/v0vXzOWKwMY[/embed]


MarsEdit 4 Beta

MarsEdit, my desktop blogging tool, is finally getting some upgrades.

It’s been over 7 years since MarsEdit 3 was released. Typically I would like to maintain a schedule of releasing major upgrades every two to three years. This time, a variety of unexpected challenges led to a longer and longer delay.

The good news? MarsEdit 4 is finally shaping up. I plan to release the update later this year.


The Failing World Wide Leader

Over seven years ago, I hypothesized about ESPN falling from, what looked like at the time, an unassailable perch. All my speculation turned out to be off base, but ESPN has been taking it in the shorts recently. Witness The Athletic preparing to swoop on newly available talent, according to Bloomberg.

2017 is shaping up to be a rough year for sports journalism. ESPN, Fox Sports, Sports Illustrated, Bleacher Report, and Yahoo Sports have all cut staff positions in the last several months, showing the deep cracks in the predominant business model of online sports news. The founders of the Athletic, an 18-month-old online sports publication, see opportunity in the struggles of the biggest companies. As the news of the cuts kept coming, co-founders Alex Mather and Adam Hansmann, who have no previous journalism experience, hastily pulled together $5.8 million in new capital from investors, in a round they closed last week. The plan is to scoop up laid-off writers, and put them to work building a new kind of sports news operation as the traditional industry leaders are in retreat.

From what I can tell, a combo of mobile / digital (cannibalizing cable subscriptions on volume and price) and demographics (younger folks interested in “edgier” not SportsCenter) are the trends causing fits in Bristol, Conn.


Startup Showcase Winners

Interesting collection of startups that won a share of the showcase at Strata Data, San Jose.

These four companies underscore the unbroken link between on-demand computing, big data, and machine learning. While the ‘90s and “oughties” were about building up the front-end user interface—and in the process, making powerful technology simple enough to find billions of users—more recent years have been about laying the groundwork for adaptive, always-aware organizations.

I wasn’t there to view the startup sales pitchers or do any voting. However, Allistair Croll’s assertion that these are AI startups seems a bit off. The short descriptions make three seem to be more data wrangling / harnessing companies. The fourth is a media analytics platform for the e-sports era. There might be some AI hidden in there, and maybe that’s the point, but they sure don’t feel like Bradford Cross’ vertical AI startups.


A Zulip Chat

I’ve actually been enjoying Podcast.__init__ for a bit now. (The Podcast About Python and the People Who Make It Great .) Recently, Tobias Macey, the host, had an interview with Tim Abbott. Abbott’s the lead developer of the open source Zulip project, which is a “modern group chat” application.

As good as Podcast.__init__ has been, this was a really interesting interview. First, Abbott had successfully exited two startups and spent some time deeply embedded in the Dropbox engineering team. So there was some interesting technical organization discussion. Second, Abbott had some very cogent thoughts on how to create a vibrant open source project. A couple of key things that stood out to me were making onboarding of new contributors as frictionless as possible and systematically externalizing his knowledge into visible documentation, as opposed to invisible e-mails.

I’m also sort of curious if Zephyr, which inspired Zulip, is still used at M.I.T. The community of Zephyr users must be vanishingly small, so I was surprised to hear of Abbott’s fondness for it. He strikes me as a true Engineer.

At my last gig, we routinely had breakouts of bikeshedding arguments regarding the mandatory, organizational discussion, group chat application. A year or two ago, as a gag, I had half a mind to propose Zulip to the company, but wisely thought better of it.


Podcatching With Overcast

Just wanted to mark the fact that I’m a fan of Overcast for subscribing to and playing podcasts. As a technonerd, I might do with a few more bells and whistles, but the overall simplicity makes it a compelling app on the iPhone.

Overcast is good enough that I paid for an annual subscription. YMMV.


Microservices Architectures

Diggin’ in the starred items crates and fell into this post from Camille Fournier about some field-earned wisdom on microservices:

This article is going to have two examples. The first is the rough way “microservices” was deployed in my last gig, and why I made the decisions I made in the architecture. The second is an example of an architecture that is much closer to the “beautiful dream” microservices as I have heard it preached, for architectures that are stream-focused.

Not too deep into the weeds but enough technical insight to be useful. Key takeaway is to not get hyper-aggressive about decentralizing data management.


Are GraphDBs Needed?

Adrian Colyer summarizes an academic paper that gets into that question, or more accurately benchmarks a few graphdbs on a representative workload:

If you want to say “my database is better than your database” then you really also need to specify “for what?”. And if you want to evaluate whether graph databases really do earn their keep as compared to relational databases, you really want to do the comparison on the home turf of the graph databases – the use cases they claim to be good at.

The final outcome is that traditional RDBMS engines, using straight SQL instead of a specialized graph query language, have much better performance. Gremlin takes it on the chin a bit.

Other stores I’d be interested to see benchmarked include: BlazeGraph, OrientDB, Weaver (dead?), ArangoDB, and AgensGraph.


Yellow Shoes

Some might argue for the original, but Calibre’s remix is just as boss

[embed]https://youtu.be/7cjkxOpVh7w[/embed]

Friday music linkage inspired by a recent Calibre BBC Radio 1 Essential Mix and the classic Fabriclive Calibre session.


Curious About Kotlin

I’ve been meaning to link park the Kotlin programming language. In general, I’m just a programming language nerd and when Google promoted Kotlin for official Android programming, the language hit my radar. Via some random Web surfing (people still do that right?) I came across this brief RedMonk overview of why Kotlin is gaining in popularity:

The short version is that Kotlin is a JVM-based language originally released in 2011 by the JetBrains (makers of IntelliJ) team from St Petersburg, Russia. Like Scala, an inspiration for the language, Kotlin is intended to improve on the Java foundations both syntactically and otherwise while trading on that platform’s ubiquity.


StackRox Podcast

I enjoyed Derrick Harris’ interview with the founders of StackRox:

In this episode of the ARCHITECHT Show, StackRox co-founders Sameer Bhalotra and Ali Golshan break down the state of container security and the new technology they have built to solve it. Bhalotra and Golshan have deep histories doing cybersecurity everywhere from startups to Google to the White House, which they draw on to discuss the security threats and opportunities that microservices present, as well as best practices for cybersecurity in general. This week, StackRox emerged from stealth mode after building the product and company for nearly 3 years.

Sameer and Ali had interestingly different backgrounds coming from government and enterprise consulting. From a total nerd perspective, they came across as a skoosh slick in their answers and choreographed handoffs, but I’ll chalk that up to being well-polished founders who’ve been on the fundraising and customer development trail for a while. That’s how you gotta sound to get C-suite types to fork over the cash.

But on the surface there are some neat ideas in the StackRox product. In the same way that networking technology has become disaggregated, microservices architectures have disaggregated applications and allowed for deeper introspection, monitoring, and remediation.

Have to say, I’ve been impressed by the guests that Harris has been able to get for his interviews.


Altered Carbon News

If it happens, I could get into a graphic novel version of Takeshi Kovacs.

[embed]https://twitter.com/Newsarama/status/887706294248984576[/embed]

Author Richard K. Morgan will bring Altered Carbon, the Philip K. Dick Award-winning novel published by Gollancz in the UK and soon to be adapted as a Netflix television series, to Dynamite Entertainment with all-new, in-continuity stories, exclusively available in the comic book and graphic novel formats.

Heck, this might be enough motivation to sign up for Netflix.


NextGen Network Algorithms

Link parkin’ [embed]https://twitter.com/jure/status/849061908615483392[/embed]

Jure Leskovec knows what he’s talking about when it comes to networks.


How Times Have Changed

As I’ve said before, there’s been a bit of gardening going on here behind the scenes. This has made me revisit a number of older posts on this here blog.

Circa 2010, I was seriously investigating ways to get mobile data access for a reasonable price. The number of posts regarding the HTC Evo as a potential phone + hotspot combo is impressive. That’s a cute little time capsule of technology.

Not to mention there used to be some company called Palm back then.

Eventually I wound up just getting an iPhone, which at the time only provided 2GB of 3G connectivity per month. Eight years later, with rollover, I usually have 8GB of LTE for two devices for around the same price. Unlimited text messages to boot. The 8GB isn’t particularly impressive, but the rest of the kit vice price is of note.

I’m still on the iPhone (6S Plus), but becoming really intrigued by a top of the line Google Pixel on Google Fi. A friend of mine speaks highly of the Android experience and iOS isn’t providing any level of excitement to me these days.

Times may have changed but technolust never goes away forever!


GCP Public Datasets

Google Cloud Platform hosts a number of public datasets:

Public Datasets on Google Cloud Platform makes it easy for users to access and analyze data in the cloud. These datasets are freely hosted and accessible using a variety of data warehouse and analytics software, from open source Apache Spark to cutting edge Google technologies like Google BigQuery and Google Cloud Dataflow. From structured genomic or encyclopedic data to unstructured climate data, Public Datasets provide a playground for those new to big data and data analysis and a powerful repository for skilled researchers. You can also integrate with your application to add valuable insights for your users. Whatever your use case, these datasets are freely available on GCP.

The thing I find surprising is that the Common Crawl web archives aren’t on GCP, especially given Google’s web heritage. Apropos the late, lamented Fairness Doctrine, Common Crawl is hosted on AWS. There was a good, recent GCP Podcast episode with the Public Datasets team that had an e-mail contact. Maybe I’ll fire off a question.


Docker Containers and Signals

Here be dragons. I know from personal experience but Hynek Schlawack explains why way better than me.

Proper cleanup when terminating your application isn’t less important when it’s running inside of a Docker container. Although it only comes down to making sure signals reach your application and handling them, there’s a bunch of things that can go wrong.

Really, as Hynek says, “Avoid being PID 1.”


Pacoid Presentations

A few years ago, I had the pleasure of meeting and chit-chatting with Paco Nathan. Back then he was with DataBricks, but now he’s at O’Reilly Media doing interesting things with Jupyter and learning. I enjoyed couple of his recent presentations. The first on AI inside O’Reilly Media.

[slideshare id=77387467&doc=aiinorm-170630013449]

And one on a TextRank rewrite in Python.

[slideshare id=71979383&doc=sfpython-170209212325]

Yowsa! That slideshare shortcode actually worked. We’ll see how it comes out in the RSS feed


Agent Iris

What is Iris?

Iris is designed to help non-expert programmers who understand what kinds of analyses they need to run (for example, creating a logistic regression model, or computing a Mann-Whitney U test) but not how to write the code to accomplish these goals. Iris also allows expert programmers to accomplish data science tasks more quickly.

Iris supports a broad set of functionality available in popular Python scientific libraries such as scipy and scikit-learn, and we intend to open source the system upon release.

And from a deeper explainer:

Iris supports interactive command combination through a conversational model inspired by linguistic theory and programming language interpreters. Our approach allows us to leverage a simple language model to enable complex workflows: for example, allowing you to converse with the system to build a classifier based on a bag-of-words embedding model, or compare the inauguration speeches of Obama and Trump through lexical analyses.

Iris is an academic research project led by Ethan Fast of the Stanford CS department. I’ll be interested to see how far this gets. Conversational agents that are domain specific, vertically integrated with an environment, and targeted at complex activities seem a bit more promising than the low bar tasks industry currently seems to be focusing on (cough, meeting scheduling, cough). Also feels like a “right moment” with Siri, Cortana, Alexa, Slackbots, Twitterbots, Xiaoice, Tay, and friends establishing a beachhead but bigger wins coming down the road.


Paid In Full

Better late than never.

[embed]https://twitter.com/EricBandRakim/status/883496358111477761[/embed]

“You thought I was doughnut. You tried to glaze me.”

The funny thing about the iconic Paid In Full album is that I always found the album version of Eric B. is President ultra irritating. I was lucky enough to purchase the 12” single well before the album came out. The single cut didn’t have that annoying grinding sound all over it. It was just the simple beat, Eric B. scratching, and Rakim’s dynamically unique rap style. That’s the real track to me.

30 years!! Damn time flies!


The SV AI Scene

An informed trip report from Natalino Busa regarding the Global Artificial Intelligence Conference. A few months stale, but still useful.

First of all, let me start by saying that literally everybody is doing (or claiming to do) AI in the bay area. AI has inflamed the spirits of pretty much every single software engineer, data scientist, business developer, talent scout, and VC in the greater San Francisco area.

All tools and services presented at the conference embed some form of machine intelligence, and scientists are the new cool kids on the block. Software engineering has probably reached an all-time low in terms of coolness in the bay area, and regarded almost as the “necessary evil” in order to unleash the next AI interface. This is somewhat counter-intuitive, as actually Machine Learning and AI are more like the raisins in raisin bread, as Peter Norvig and Marcos Sponton say.

I like the raisin bread analogy, which means the data platform engineering aspect of building AI products might be seen as a lucrative “dirty job”.

Via a tweet from @pacoid which has been sitting in my faves pile.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.