home ¦ Archives ¦ Atom ¦ RSS

Quamina Event Matching

Tim Bray is an Internet Person of Some Renown (TM) that I’ve long admired from many angles: as a software developer, as a blogger, and as a public thinker with considered political and cultural views. As a hacker, the urge to code never seems to wane, so he’s open sourced a library, Quamina, for doing event matching at Amazon scale.

Many moons ago, a gentleman named Bob Wyman was the publicly web facing technical brains behind PubSub.com. Wyman advertised the provided service as “prospective search” against blogs, represented mainly by RSS feed content, among other “real-time” things on the ’Net during that period. You could upload matching patterns and then follow streams of content items that matched the patterns. As a business, it didn’t quite work out, but I remember Wyman occasionally dug into the underlying algorithms that made massive numbers of patterns effectively match with low latency. I’ve fruitlessly tried to rediscover his musings to no good effect.

So Quamina looks like a modern take on the core algorithmic challenge within PubSub: taking extremely large numbers of patterns and matching content in extremely low amounts of time. Patterns are expressed as JSON and match against JSON objects. Given that many popular formats have known, or well understood, mappings into JSON it shouldn’t be hard to test Quamina against a wide range and scale of datasets. Sounds like a benchmarking challenge waiting to happen.

I’m glad Bray released Quamina and I’ll definitely be digging in trying to understand how it works. If I was a halfway decent Go programmmer, I would take a quick stab at wrapping some kind of RESTful API around it and seeing how close I could make it look like my memories of PubSub. So many ideas, so little time.

Although, if Quamina is fast enough, a cheap proof of concept could be done by wrapping something like Python’s FastAPI around subprocess invocation of the Go executable. Hmmmmm.


Joined Changelog++

Previously I mentioned The Changelog as one of my current favorite podcasts. Run by Jerod Santo and Adam Stacoviak, I use the term “podcast” loosely here, because they have a small empire of affiliated podcasts, an online community, and a couple of newsletters. There’s a wide variety of great, well produced content in differing formats. And they’ve been at it for over 10 years 😮!!

Anyway, I woke up one morning, was listening to an episode and heard the plug for Changelog++. Figured it was high time I supported these guys directly and pulled the trigger. Jerod promptly sent a nice email the same day asking, “What was the driving factor(s) for you signing up?” Here’s how I responded:

Nothing specific. Been a subscriber to The Changelog for 6 years, Practical AI since it started, and now dipping into GoTime. I’ve just been on a little kick directly supporting small individual content creators where possible. The folks who make podcasting enjoyable for me on a regular basis. Love what you guys are doing and especially the Changelog “house” interview style. Great production values without sounding overly corporate.

Need to give a few ++ episodes a listen before I can provide any concrete criticism, but so far, so good.


Doctorow’s Memex

I’ve occasionally dipped into Cory Doctorow’s publishing ouerve but never drank from his internet posting firehose. Apparently he left Boing Boing a bit ago and started his own link blogging site Pluralistic. As you would expect of Doctorow, it’s a flood of interesting content, leftish, activist, cryptopunk oriented, no surveillance capitalism to be seen.

Part of his 20 Years Blogging post is a detailed description of an linkblogger’s daily creative processs. In no way could I ever emulate that, but I was struck by a comment on his vast trove of previously published content:

My composition is greatly aided both 20 years’ worth of mnemonic slurry of semi-remembered posts and the ability to search memex.craphound.com (the site where I’ve mirrored all my Boing Boing posts) easily.

A huge, searchable database of decades of thoughts really simplifies the process of synthesis.

So for me throwing as much stuff as possible you’ve read, written, or listened to, into a your personal memex and sprinkling in a bit of search/AI technology seems like an interesting, if not winning, proposition. If nothing else, repeated elements like This Day in History can be semi-automated.

And I think I’ll give Pluralistic a try.


2022 Books Completed, Part 3

This is for both March and April of 2022, which is somewhat disappointing. For various reasons, I just hit a wall over those two months and didn’t make as much reading progress as I would have liked. I was pretty consistent with my daily read, but usually once I get going I’ll exceed the allotted time by a significant margin. Not so much these past two months.

Here come’s the break …

read more ...


Emacs and Emojis

So that’s all there was to it?

For the longest time, I had no memorable and easy way to enter and display Unicode text within my Emacs buffers. When jumping back into blogging this is somewhat problematic since the world sort of expects at a minimum the frivolous usage of a few emojis to demonstrate you’re keeping up with the cool kids 💥! After installing Nerd Fonts on my Macbook, I was more determined than ever to solve this conundrum.

Mickey Petersen of Mastering Emacs (buy his book), solved the display challenge. Install a package, do some customization, and voilà! Unicode characters, cut and paste from the Internet, display.

Petersen (buy his book, really) also had a nifty approach to entering emojis. Emacs has “input methods” which you can rig to expand text sequences into Unicode characters. You have to make a modal switch into the input method, but it’s better than nothing.

So then I started digging arouund to remind myself on how to insert characters with diacriticals. Of course Petersen (you did buy his book right?) had the answer, but within that answer was this nugget:

To insert a code point type C-x 8 RET and enter the Unicode name (type TAB twice to get a complete list).

Well, whaddya know 🤔


Link Parkin’: Speech to Text

Link parkin’, two services for converting speech to text:

The first via Talk Python, the second from Python Bytes


Prompt Action

Xonsh Terminal Screen Capture

Been having some fun reworking my terminal and shell setup based upon xonsh. I’m quite proud of myself for getting the 🐚 emoji into my prompt, which was decidedly trickier than I expected. Console is iTerm2. Glyphs and icons via Nerd Fonts. The system info was generated by neofetch


Personal Podscape Memex

The initial salvo on the podcast deep dive thesis was divorced from end users, strictly based upon the content of the podcast. Another aspect of interest to me is the incorporation of personalization.

I listen to a small percentage of the episodes from feeds I’m subscribed to. Each one I do listen to feels like there’s a ton of knowledge, of which I retain just fragments. Could an intelligent podcast app monitor my listening and automatically weave together products that could help my future self? At the very least a personalized search engine could be crafted. But is a memex for my personal podscape in reach?

Overcast has an OPML export feature, including a rich version with episode data. A prototyping foundation seems apparent. Jonathan LaCour cooked up a Python script to automate the download.


DuckDB and PyArrow

I previously link parked DuckDB as an embedded, high speed, OLAP engine. Gerard Bentley has some nice examples of using DuckDB, Arrow, and Python. Even though the combo isn’t definitively better than Pandas and Arrow, I like how seamless the integration between DuckDB and Arrow data appears. Simply referencing a Python variable from a DuckDB query is a neat trick. Wonder how that’s implemented under the covers.

Related DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB


Deep Diving Podcasts

One of the reasons I’ve become interested in deep analytical dives of podcasts is because of Software Engineering (SE) Radio. It’s not quite at the top of my podcast fave lists because I don’t feel an urge to listen to it regularly. However, the content is typically excellent and I like the no nonense interview style. A little topic introduction, a little guest bio, and then right into the technical discussion. No ads either.

The analytic draw is the historical depth of the content. SE Radio launched back in 2006, 16 years ago. There are 511 episodes as of this writing. It’s provided under a Creative Commons 2.5 License. There are so many summarization possibilities to apply and questions to be answered of such a corpus. Think of all the tech topics that have come up over the years. Not to mention that there are some exceedingly distinguished guests making appearances. How have the themes and personalities ebbed and flowed over the years? What programming languages have ascended and waned? Are there any enduring repeat offenders? How does this all connect and intersect with the wider tech topic podcast ecosystem?

And that’s just a single long lived podcast! Between The Python Podcast.__Init__ (363) and the Data Engineering Podcast (288), Tobias Macey has delivered 651 episodes, primarily by himself. For almost any vertical topic you can imagine, there are probably at least a decent handful of well rooted podcasts worth deep diving on.

I’ll need to get a good lawyer to interpret the license, but the only show stopper might be the legality of building publicly available systems that derive from analytics of the podcast episodes. On the one hand, I feel like this might be like podcast search indexing which would seem to not be a derivative work. On the other, you are building on the content sooo…

Now whether all of the episodes can actually be collected is a technical challenge for another day.


Beyond Pelican

Link parkin’. A couple of other static site generation engines:

I’m probably hobbling Pelican by running page generation on a 2008 White plastic MacBook (!!), but it feels pokey to me. And it always seems to regen from all my source pages, including a ton that haven’t changed in years. Whatever algorithm is being used for change detection doesn’t seem to be working. Now lining up alternatives to explore.


Complexifications

Keeping with the theme of generative art, previously I linked to an old post of mine where, I had replicated another artist’s work. The inspiration was Jared S. Tarbell’s piece from complexifcation.net entitled substrate. Curious, I went to the Web to see what had become of Mr. Tarbell.

Complexification.net is still alive, although frozen in time, featuring Java Applets from an era gone by. What do you expect from 2004-05 though. At least it wasn’t all Flash.

But I also chanced upon this somewhat recent interview (late August 2020) with Tarbell. First off, if I knew Tarbell had started Etsy I had certainly forgotten. That’s a nice feather in the cap, presumably leaving some financial runway to pursue whatever. Second, his backstory was quite interesting. Third, he seems to still be practicing!

What a great gallery. Fuel for replication, emulation, and inspiration.


Revisiting Zenbullets

Another activity I’d like to get back into is generative art. I’ve always admired the processing ecosystem and community. There was even an extended period where I was a quite serious dabbler in using Python to replicate various works. In particular, Matthew Pearson, a.k.a zenbullets, released a book entitled Generative Art and published a site Abandoned Art. I recently decided to see what Pearson was up to these days. Apparently he’s abandoned working in this arena, but there are still vestiges of his work online.

Having squatted on the domain adoptedart.org (currently redirects to mpr.crossjam.net) intending to recreate the works in Abandoned Art, I’ve got some Internet real estate to build a portfolio in this space. Just need to get stuck into it.

Interestingly, only screen captures of the abandoned works are available from Pearson’s current site. With a little digging in on the Wayback Machine, maybe the source code can be recovered.


DuckDB

Link parkin’: DuckDB

Does for OLAP what sqlite did for OLTP. Excellent podcast episode on DuckDB with Hannes Mühleisen.


Favorite Podcasts 2022

At least as of May 2022, here is a brief list of podcasts that I regularly enjoy. I’ll likely comeback and do individual deeper dives through the course of the month:

Tech heavy. Yup, I’m a nerd.

Special shout out to Tobias Macey longtime producer of both the “The Python Podcast” and the “Data Engineering Podcast” while maintaining a full time day job and being a dad. Yikes!

Additional shout out to Richard Deitsch’s podcast, recently discovered, for super high quality journalism about journalism and media. Usually I don’t enjoy going meta, but Deitsch is so good he’s worth a regular listen. Also check out his work at The Athletic. The guy is a beast.


MeiliSearch

Link parkin’: MeiliSearch

Might be useful for searching blog archives, especially in a comparison/bakeoff with sqlite’s FTS3, FTS4, and FTS5.

Via Mark Litwintschik


Bummer

Well, so much for that. An unfortunate series of events busted my “streak” after only three days. Back on the horse though.

What I would have said yesterday, as the month rolls on I expect podcast and podcast analytics to be a theme. There’s a metric crapton of episodes out there and some of it is freely licensed. Listen Notes is doing some admirable stuff on basic podcast search, but I’ve got some wacky ideas around demos, tools, and prototypes that do deep dives on collected content. Hacking around on data engineering, app development, and natural language processing should lead to some interesting portfolio results, if nothing else.


Speaking of Overcast

Apropos the title, Overcast went through a redesign, released a few weeks ago. The day it came out I was literally thinking, “is there a way in Overcast to find one’s started, but not completed, podcast episodes?” Voilà! Feature delivered.

For a while I gave Apple Podcasts a run, but once on the Overcast bandwagon, I’ve never looked back. Highly recommended, and well worth paying the annual subscription to support the developer, Marco Arment.


Podbytes Revisited

Well the following statement didn’t hold up well

Disclaimer, I don’t listen to podcasts and I’m only peripherally interested in them as another form of sociable media on the Web. This post was inspired by some observations by Ernest Miller on the scannability of podcasts.

I’m actually heavily into podcasts now (four plus years with Overcast? Wow!), but digging through my ancient archives, I came across that one on PodBytes. A heck of a lot has changed in the podcast ecosystem, but I’m sort of interested to know if the ability to do a “pull quote” has actually been enabled.


Streaking 2

In an homage to days gone by, let’s see if I can run the table posting every day for May 2022.

I actually had a good time last week fooling around with the Python based xonsh shell. Way too much time was spent getting the conch emoji (🐚) into the shell prompt but most of the blame can be laid at the feet of the powerline2 plugin. Hooking into the prompt generation isn’t obvious unless you’re already familiar with the powerline model which has been transported from the vi world. Not my jam.

There’s definitely something enticing about having Python closely coupled with the shell CLI. I need to get it installed on a few more machines and used more frequently to have a real assessment. I’m waiting to see where the real warts are but so far, so good.


2022 Books Completed, Part 2

Four completed in the shortest month, February, is nothing to sneeze at. Seven total for the year. Thoughts after the break.

read more ...


2022 Books Completed, Part 1

Arguably a light haul for January, but still off to a better start than last year at this time.

Thoughts below the fold…

read more ...


New Coat of Paint

For some reason, this post was scheduled for June of 2020 (!!) but never posted. Letting it loose now to clean out the drafts folder.

For the first time ever intentionally, I rethemed one of my blogs. New Media Hack got a new theme only because I had to apply something during the transfer to WordPress. Mass Programming Resistance hadn’t had a style change from its inception. Now it does.

The theme is pretty minimal. I forked a repository from Giulio Fidente that is spartan black text on white background. Mainly I just decided to go with square instead of rounded corners on the buttons and to tighten up the gutter between the sidebar and the main content.

Along with fixing up the RSS and Atom feeds, the theme change was the last thing to finish before declaring the transition to static publishing complete.


Transplanted

Yet another left behind in the drafts bin from June 2020. Including today for posterity’s sake.

This is my first post using Markdown and Pelican. Currently https://crossjam.net/ is resolving properly and the content has transferred over. There’s a bunch of redirects, which are a little ugly, but they serve the purpose of dealing with the old WordPress path prefixes. Mass Programming Resistance and New Media Hack (still preserved, bless its heart) are separated out into two different domains.

Now need to work on a good workflow for publishing. The upside is that I can now go with Emacs to create posts, which will accelerate my ability to spit out text. But I’ll need some key bindings to smooth out some of the linking work. Also, need to figure out a comfortable flow to update the production server. Pushing to a git repo seems like a little too much work, although netlify has an interesting take on that path. Maybe for a dev site just to get some experience.


Why the Return?

Still(!!) mopping up 2021 leftovers. This one was started in Septermber 2021. Yikes!

So I’ve come back to blogging, maybe not at previous publishing rates, but definitely with some consistency in late(ish) 2021. What gives?

On the day job, I got a big promotion in April (if you can navigate LinkedIn or are decent at Google you can find the details) that’s probably going to steer me to being much less technical at work. I’m gonna need a hacking outlet, thus I’m aiming to do some side projects on my off time. Previously, this blog was more life and media commentary oriented, but I’m hoping to pivot into a better “tech” blogger, whatever that means. At least to me it means more project details and definitely more code.

Specifically I have five independent personal development goals I’m currently working on

  1. Mastery of the Python programming language
  2. Proficiency in the Go programming language
  3. Modernizing developer skills and tools
  4. Developing a strong portfolio of publicly data projects
  5. Creating an online technical presence

My blogging will help with number 5 but it can also be a driver for solving some technical challenges, such as search indexing of this blog’s content. As Simon Willison points out, working in public has a relatively low bar for online achievement that can be extremely useful in the end.

Fast forward, 5ish months…

Haven’t made as much progress as I’d like on all the above, but I have made some progress. My PyBites profile indicates motion in fits and starts. Plus I’m on the cusp of another milestone, a brown belt, just need to do a few more pesky easy bites. I’ve actually started reading The Go Programming Language. Some of the gaps in understanding from when I was just hacking away on nsq are getting closed. Discogs data hacking is happening (more to come…) and I’ve had a personal epiphany about how to go about building data projects and a portfolio of such.

Dropping the ball on modernizing developer skills and tools, although the PyBites work is forcing some routine git interaction. Still, me and Emacs are joined at the hip.

Gettin’ there.


2021 Books Completed, Part 8

Still mopping up some 2021 odds and ends.

December was decidedly better than November. Even though it wasn’t a perfect vacation, managed to take advantage of some holiday downtime. Seven more works completed pushes the year’s total to 42, which isn’t too bad all things considered. Interestingly, this batch didn’t include any self-development or business themed tomes. I need to do a full accounting of the 2021 reading, just for posterity and to have it all in one place.

Thoughts beneath the fold …

read more ...


Manticore Search

Link parkin’: Manticore Search

Lightweight SQL-first database designed for search applications that suits both big and small data collections

More details: Manticore Search: 3 years after forking from Sphinx


2021 Books Completed, Part 7

Still mopping up some 2021 odds and ends.

Well, November turned out to be a dismal completion month. Thought I was going to make a lot of progress during the Thanksgiving holiday, but that turned out to be a dud. With 35 down for the year, I can at least still get to 40 hopefully. More after the break.

read more ...


2021 Books Completed, Part 6

Now in the process of mopping up leftover 2021 posts. Don’t know why I didn’t hit publish on this back in October. At least I was smart enough to not bake the month into the post title.

Just a solid month of October. 2021 completed total bumps to 33. Could actually make a stretch goal of 50 this year. More after the break.

read more ...


Challenging

Over a year ago I mentioned some sites to help refresh the Python coding skills. Going on about a year now, I signed up for the PyBites community and the codechalleng.es platform. Working in fits and starts, I’m now actually getting back into consistently solving the bites. Here’s my PyBites profile. Climbing the ladder.

Turns out there are curated “learning path” collections of bites, progressively more difficult. Trying to complete learning paths seems to be the carrot that gets me going. Now that I’ve chewed up a lot of the easier bites, I’m definitely hitting the more challenging ones, those valued for 3 and 4 points. Even though I’m fairly experienced in Python these take a non-trivial amount of time. Sometimes it’s just understanding the actual problem statement but other times I need to put on my thinking cap and solve an intellectual puzzle, which is great!

Definitely a happy customer and can recommend codechalleng.es if you’re looking to build up your Python muscles.


2021 Books Completed, Part 5

Huge completion haul this month. Some based upon carryover from last but momentum is momentum. Still, there’s a non-negligible number of significant tomes if anyone’s counting. This bump’s the year’s count to 28.

Thoughts below the fold.

read more ...


Discogs and Data

For the longest time, I’ve just been piling up data from Discogs.com and not really doing anything with it. Finally, I have a motivating project.

The fine folks at London’s Fabric nightclub have two great series of DJ mix releases, Fabric and FabricLive. Recently I made a bulk purchase of digital versions of 20 mixes. They arrived as .wav files with no metadata attached (that I can tell). Adding all the track metadata is something a computer should do, not a human. No problem, all of that data should be in the Discogs data. I’d also like to create playlists or a playlist DB to start noodling around with MPD as a playlist shuffling jukebox. (Why does the world hate playlists so much?).

So of course this means lots of data munging, wrangling, and management. Which is totally fine. I need some data side projects to help build a data portfolio.

Link parkin’ discogs-xml2db v2 as the way to get the Discogs data into a PosgreSQL db for querying.


2021 Books Completed, Part 4

Back in the groove for August. The tally will only show four books completed for this month, but there were two more in the hopper that will close this weekend. September should reap five completions if not six.

The Consuming Fire won’t make the top of my literary style list, being a big fan of Ursula K. LeGuin and William Gibson. A lot more exposition than I usually like. But fuck if The Consuming Fire (and The Collapsing Empire as well) aren’t fun, easy reads. And damn if Kiva Lagos ain’t a motherfucking hoot!

Apex exemplifies my distate of the technothriller genre. A sprawl of plot arcs, coming and going in fits and starts. Named characters constantly being introduced, without enough room for development. Fetishizing of national security apparati. It’s been a while since I read Nexus and Crux, the other books in the trilogy, but the underlying neuroscientific principles were compelling. Glad I closed out the trilogy but probably not revisiting.

Apparently a key element of the The Expected Goal Philosophy is constantly moaning about how people who haven’t bought into the philosophy are morons. Slightly offputting. The actual details of the stastical approach turn out to be somewhat pedestrian at the end of the day, but good to know about as football evolves. Might have read better as a pure underdog story, similar to Moneyball.

Pep Confidential turned out to be less of a hagiography then I expected, which was good. As a Manchester City fan, I was intrigued by the insights from his prior stop at Bayern Munich. I’m still a newbie regarding football tactics, so Confidential was a very revealing look behind the scenes. Also, quite entertaining given that many of the names involved are still active in world football, like Thiago Alcântara, David Alaba, Mario Mandzukić, Tony Kroos, and Pierre-Emile Højbjerg, with the lens of history to see their subsequent paths. Looking forward to reading Perarnau’s follow-up Pep Guardiola, The Evolution.

Apropos Elle Driver, “You know, I’ve always liked that word … hagiography … so rarely have an opportunity to use it in a sentence.”


Amphetamine Endorsement

Just wanted to give a quick plug for Amphetamine, a Mac OS X menu bar app that helps control screen saving and more. On any Mac where I create a new account, Amphetamine is in that small collection of “must install” apps. Really well polished with a lot of functionality and free to boot! Thanks William Gustafson.


Moving to Pyenv

Motivated by Justin Mayer’s observations about Homebrew Python I’ve been working on using pyenv and pyenv-virtualenv. Still taking a bit of getting used to, but if my virtualenvs don’t get broken I’ll be a happy camper.


Blogaversary, Macaversary 2021

Holy Smokes! It’s been 4 years since I’ve done one of these.

13 years since the first post and the first MacBook mention. Now I’ve moved up to a 2020 Apple Silicon MacBook Air. That old White MacBook? It’s still providing service as an Ubuntu 20.04 (Yes, Ubuntu) server, building the html for this here blog.


Pandoc Stuff

Link parkin’: panflute

All of this blog’s content is written in Markdown. As a side project, I’m interested in working on a homegrown solution for search. Enter Pandoc for doing some of the heavy lifting on document parsing and generating an Abstract Syntax Tree. Thence to panflute for spitting out into a format that could be indexed using sqlite FTS to get started and then Manticore just for giggles.


2021 Books Completed, Part 3

Slowed down a bit in July, but with four books completed this month plus the honorably included, that’s six more added to the tally. Nineteen down for 2021.

I respect the writing quality of The Turing Test and indeed, quite a few of the stories were thought provoking: The Turing Test (eponymous short story), La Maccina, The Gates of Troy, Piccadilly Circus, Karel’s Prayer. However, Beckett’s style didn’t really resonate with me.

The Game of Our Lives places English football in a larger and wider social context, especially in times before the Premier League.

The Effective Manager is exceedingly prescriptive, which may or may not be what one needs. Definitely useful at this point in my career.

Done Deal was a pretty mundane explainer for me. A few bits of transfer policy detail were enlightening.


Honorable Inclusion

Previously I had noted completing The Grendel Omnibus, Volume1 and The Infinity Gauntlet via Comixology. Normally I don’t include graphic novels in my book reading tallys due to some weird notion that they’re “lightweight”. Grendel was the equivalent of 500 printed pages and Infinity Gauntlet 250. The judges have subsequently determined that they do qualify in this category.


2021 Books Completed, Part 2

Still making quite good progress on the book reading front. Herewith is the list of completed texts from this June. Just for posterity’s sake.

The Lathe of Heaven pleasantly reminded me of how beautiful Le Guin’s writing can be. Some excellent passages in that book.

Fear and Loathing … is not only a football book but a history book. Learned a lot about Spain.

My only appointment sports viewing is the English Premier League, whose birth and rise is documented in The Club. As a relative newcomer to football, Clegg and Robinson filled in a lot of gaps in my knowledge. It’s actually sort of amazing (to me) in what bad shape English football was in just 30 years ago. In my mind, the Premier League had just been this historically monstrous behemoth. Not so fast …

Good Strategy, Bad Strategy is an excellent business book. 10 years old now, it’s also interesting from a tech perspective because it uses NVIDIA as a case study before the company’s recent AI/ML fueled surge.

Atomic Habits does what it says on the tin and actually provides concrete actionable techniques for constructing enduring positive habits.

Two football books, two science fiction books, and two personal development(ish) books turned out to be a nice mix. I’m trying to replicate that during July.

Thirteen confirmed completions for the year. Thirty total is within reach.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.