Link parkin’, pyLast
A Python interface to Last.fm and other API-compatible websites such as Libre.fm.
Will come in handy for downloading a years’ worth of scrobbles from my account and conducting some analysis.
Link parkin’, pyLast
A Python interface to Last.fm and other API-compatible websites such as Libre.fm.
Will come in handy for downloading a years’ worth of scrobbles from my account and conducting some analysis.
First off, I really need to generate some basic
discriminative descriptive statistics across all of this
Discogs data. Pick off the low hanging fruit.
Having said that, since I’ve got the data in a PostgreSQL instance, including my own custom views, some kind of HTTP based API to access the data would provide an interesting prototype.
Enter PostgREST
PostgREST is a standalone web server that turns your PostgreSQL database directly into a RESTful API. The structural constraints and permissions in the database determine the API endpoints and operations.
What would oEmbed cards look like for pure textual playlists?
This sat in the drafts folder for a bit, but I finally decided to just hit publish and stop seeking perfection.
Well that was a mildly annoying adventure discovering how to get code syntax highlighting working. Pelican’s markdown support includes CodeHilite by default but I couldn’t figure out how to actually trigger. Turns out once I installed the Pygments module, things kicked in.
Anyway, who knew the language of the first code segment to appear on this blog would be SQL? Using some handy regular expression features of Postgres, I layered some views on top of data imported from discogs-xml2db. The target was getting an extraction of releases from the Fabric and FabricLive series. Still a fair amount of data normalization needed to be done, but at least I’ve got 100% recall with not too much extra stuff and that’s only because the titles aren’t quite consistent. Ultimately had to resort to explicitly black listing some rows
Code and example output below the fold
So I took the discogs-xml2db tool and ran it against the
Discogs Data, May 2022 release. I got back 8.1 Gb, 😱, of csv data to
ingest into PostgreSQL. I’ve done it for previous months and it’s
ingested just fine, but there’s some interesting exploration that can
be done with the csv data, before, and after ingest. But I’m gonna
need a few tools:
Some might call me an Emacs tramp (🤣), but I’m referring to the Emacs mode that allows for editing remote files over ssh connections.
TRAMP (Transparent Remote Access, Multiple Protocols) is a package for editing remote files, similar to AngeFtp or efs. Whereas the others use FTP to connect to the remote host and to transfer the files, TRAMP uses a remote shell connection (rlogin, telnet, ssh). It can transfer the files using rcp or a similar program, or it can encode the file contents (using uuencode or base64) and transfer them right through the shell connection.
The overall experience is pretty much seamless, while not having to deal with a bunch of warts related to distributed file systems such as SMB, AFS, Dropbox, et. al. Works great even with version control tools such as magit.
The fact that TRAMP can operate over telnet (😱) of all protocols should tell you it’s ancient though. An elegant weapon, for a more civilized age.
By my math, I went 29 for 31 on daily posts last month. I was hoping to run the table after a long period of infrequent, intermittent posting. There’s a certain symmetry in how I screwed up right at the beginning of the month and right at the end of the month, but 20+ days in between is not too bad.

Speaking of streaks, 1 year ago I set about to listen to more music. The screen capture above, from the Streaks iOS app on my phone, emphasizes that I’ve managed to do that for 365 days straight. 🎊 🥳 🎉 Listening in this case means a minimum of 3 songs, but almost uniformly I got in a full “CD” if not multiples.
What always gets me about these types of streaks is overcoming those crappy days. Days when you fell wishy washy and think you might have COVID. Days when work runs long and you feel dead tired. Days when your kids get on your last nerve. Days when you feel like you let everyone down.
But a little music always soothes the soul, even if one falls asleep in the middle of the mix 🤣.
I have a wealth of starred items stashed away in my feedreaders, plural. Link parkin’ a few things of note. Only goal is to send you off to some pieces loosely joined. Not too much thought put into it.
Recently learned about reader, a minimalist Python module for managing collections of RSS (and other) feeds.
Mark Litwintschik does his usual thorough data spelunking against The Streets of Monaco
Simon Willison was a guest on architecture notes providing a great overview of datasette
Matt Webb talks about his blog writing approach. I use Feedbin much the same way he does. I wonder how many feeds I’m subscribed to?

Yesterday I lost the plot by yak shaving my shell setup installation process to work better under Ubuntu. The above capture illustrates that lots of progress was made, but then I forgot to actually post. Oh well!
Peter Wang gave a keynote at PyCon US 2022. Apparently it excited a lot of people. The topic was PyScript, a project to put Python in the browser via WebAssembly.
PyScript is a framework that allows users to create rich Python applications in the browser using HTML’s interface and the power of Pyodide, WASM, and modern web technologies. The PyScript framework provides users at every experience level with access to an expressive, easy-to-learn programming language with countless applications.
I’d heard about the tool many times, but I just started using
pgcli to interact with PostgreSQL databases from the
command line. 🤯
pgcli is really good stuff. Auto-completion, syntax
highlighting, Emacs style key bindings, and ssh
tunneling, amongst many other great features. And
implemented in open source Python (BSD license) to boot. What’s
not to love?
Why did I wait so long? What the hell was I thinking?
Another circumvation circumnavigation about the center of solar
system completed.
By many measures, year-over-year I’m quite a bit healthier, wealthier, and wiser.
However…….
The Godfather is one of my favorite movies. The first spoken words are:
Gettin’ harder to maintain belief in these times.
We’ll just have to keep on keepin’ on.
Back to improving my Emacs environment. Since I conquered emojis in Emacs, now it’s time to deal with spelling since typos in my blog posts drive me up a wall.
I vaguely had a recollection about some *spell package for Emacs but
couldn’t remember the specifics. Turns out it was the built-in flyspell,
conveniently covered by Ryan Moore. Especially useful is the
keybinding for the mouse click on Macbook trackpads that let’s you see
variant spellings.
That led me to thinking about text expansion to help cut down on some
of my more frequent lapses (looking at you occasionally). And what
would an MPR Emacs post be without a mention of Mickey Petersen (go
buy another copy of Mastering Emacs). Emacs has a venerable
abbrev package and Petersen ably addresses its usage for spell
correction. Job done, but there’s bonus material on using
hippie-expand which does a whole lot of interesting dynamic abbreviation
expansion, a.k.a replacements based on text already existing in
buffers. If you don’t like hippies, he mentions at least three other
text expansion modules 😆.
One of the real reasons I need text expansion is one of my favorite emojis is this one, 😆, which is named, I kid you not
SMILING FACE WITH OPEN MOUTH AND TIGHTLY-CLOSED EYES
Sort of a pain to enter after C-x 8 RET even with tab completion.
Below are some podcast recommendations for those invested in the Python ecosystem.
The first four were previously listed in my Favorite Podcasts 2022 list. The Python Podcast.__init__, Python Bytes, and Talk Python To Me are my gotos, in that rough priority order.
Strictly speaking the Data Engineering Podcast isn’t about Python, but there’s enough overlap with the ecosystem that a lot of Python pops up. The PyBites Podcast is mostly specific to participants in the PyBites community and platform. It’s a little more on the personal and professional development side than straight Python ecosystem. Oddly enough, even though I’m recommending it and I’m a paid subscriber on Real Python, I don’t often listen to episodes from The Real Python Podcast. But when I do, I’m highly satisfied.
I just threw in Software Engineering Radio because it hasn’t been mentioned in this blog yet, but it’s really good and has the occasional Python bits 😆.
Actually, Software Engineering Radio was previously mentioned in Deep Diving Podcasts. We regret the error
Link parkin’: SuperFastPython
Jason Brownlee seems super excited about Python concurrency mechanisms (threads/processes) and helping developers better understand them. A cursory scan of a few of his tutorials indicates a high level of quality and coverage.
I’ve been doing a bit of Python thread programming in the day job, so these topics are of interest.
What does “Finishing Fabric” mean? I have successfully collected digital versions of the 100 releases of Fabric and the 100 releases of Fabric Live. For my purposes, the release count ignores the radio promo mixes. These electronic music mix series were published via the sister label of the Fabric London nightclub.
So why would anyone in their right mind pursue this?
Tim Bray is an Internet Person of Some Renown (TM) that I’ve long admired from many angles: as a software developer, as a blogger, and as a public thinker with considered political and cultural views. As a hacker, the urge to code never seems to wane, so he’s open sourced a library, Quamina, for doing event matching at Amazon scale.
Many moons ago, a gentleman named Bob Wyman was the publicly web facing technical brains behind PubSub.com. Wyman advertised the provided service as “prospective search” against blogs, represented mainly by RSS feed content, among other “real-time” things on the ’Net during that period. You could upload matching patterns and then follow streams of content items that matched the patterns. As a business, it didn’t quite work out, but I remember Wyman occasionally dug into the underlying algorithms that made massive numbers of patterns effectively match with low latency. I’ve fruitlessly tried to rediscover his musings to no good effect.
So Quamina looks like a modern take on the core algorithmic challenge within PubSub: taking extremely large numbers of patterns and matching content in extremely low amounts of time. Patterns are expressed as JSON and match against JSON objects. Given that many popular formats have known, or well understood, mappings into JSON it shouldn’t be hard to test Quamina against a wide range and scale of datasets. Sounds like a benchmarking challenge waiting to happen.
I’m glad Bray released Quamina and I’ll definitely be digging in trying to understand how it works. If I was a halfway decent Go programmmer, I would take a quick stab at wrapping some kind of RESTful API around it and seeing how close I could make it look like my memories of PubSub. So many ideas, so little time.
Although, if Quamina is fast enough, a cheap proof of concept could be done by wrapping something like Python’s FastAPI around subprocess invocation of the Go executable. Hmmmmm.
Previously I mentioned The Changelog as one of my current favorite podcasts. Run by Jerod Santo and Adam Stacoviak, I use the term “podcast” loosely here, because they have a small empire of affiliated podcasts, an online community, and a couple of newsletters. There’s a wide variety of great, well produced content in differing formats. And they’ve been at it for over 10 years 😮!!
Anyway, I woke up one morning, was listening to an episode and heard the plug for Changelog++. Figured it was high time I supported these guys directly and pulled the trigger. Jerod promptly sent a nice email the same day asking, “What was the driving factor(s) for you signing up?” Here’s how I responded:
Nothing specific. Been a subscriber to The Changelog for 6 years, Practical AI since it started, and now dipping into GoTime. I’ve just been on a little kick directly supporting small individual content creators where possible. The folks who make podcasting enjoyable for me on a regular basis. Love what you guys are doing and especially the Changelog “house” interview style. Great production values without sounding overly corporate.
Need to give a few ++ episodes a listen before I can provide any concrete criticism, but so far, so good.
I’ve occasionally dipped into Cory Doctorow’s publishing ouerve but never drank from his internet posting firehose. Apparently he left Boing Boing a bit ago and started his own link blogging site Pluralistic. As you would expect of Doctorow, it’s a flood of interesting content, leftish, activist, cryptopunk oriented, no surveillance capitalism to be seen.
Part of his 20 Years Blogging post is a detailed description of an linkblogger’s daily creative processs. In no way could I ever emulate that, but I was struck by a comment on his vast trove of previously published content:
My composition is greatly aided both 20 years’ worth of mnemonic slurry of semi-remembered posts and the ability to search memex.craphound.com (the site where I’ve mirrored all my Boing Boing posts) easily.
A huge, searchable database of decades of thoughts really simplifies the process of synthesis.
So for me throwing as much stuff as possible you’ve read, written, or listened to, into a your personal memex and sprinkling in a bit of search/AI technology seems like an interesting, if not winning, proposition. If nothing else, repeated elements like This Day in History can be semi-automated.
And I think I’ll give Pluralistic a try.
This is for both March and April of 2022, which is somewhat disappointing. For various reasons, I just hit a wall over those two months and didn’t make as much reading progress as I would have liked. I was pretty consistent with my daily read, but usually once I get going I’ll exceed the allotted time by a significant margin. Not so much these past two months.
Here come’s the break …
So that’s all there was to it?
For the longest time, I had no memorable and easy way to enter and display Unicode text within my Emacs buffers. When jumping back into blogging this is somewhat problematic since the world sort of expects at a minimum the frivolous usage of a few emojis to demonstrate you’re keeping up with the cool kids 💥! After installing Nerd Fonts on my Macbook, I was more determined than ever to solve this conundrum.
Mickey Petersen of Mastering Emacs (buy his book), solved the display challenge. Install a package, do some customization, and voilà! Unicode characters, cut and paste from the Internet, display.
Petersen (buy his book, really) also had a nifty approach to entering emojis. Emacs has “input methods” which you can rig to expand text sequences into Unicode characters. You have to make a modal switch into the input method, but it’s better than nothing.
So then I started digging arouund to remind myself on how to insert characters with diacriticals. Of course Petersen (you did buy his book right?) had the answer, but within that answer was this nugget:
To insert a code point type C-x 8 RET and enter the Unicode name (type TAB twice to get a complete list).
Well, whaddya know 🤔
Link parkin’, two services for converting speech to text:
The first via Talk Python, the second from Python Bytes

Been having some fun reworking my terminal and shell setup based upon xonsh. I’m quite proud of myself for getting the 🐚 emoji into my prompt, which was decidedly trickier than I expected. Console is iTerm2. Glyphs and icons via Nerd Fonts. The system info was generated by neofetch
The initial salvo on the podcast deep dive thesis was divorced from end users, strictly based upon the content of the podcast. Another aspect of interest to me is the incorporation of personalization.
I listen to a small percentage of the episodes from feeds I’m subscribed to. Each one I do listen to feels like there’s a ton of knowledge, of which I retain just fragments. Could an intelligent podcast app monitor my listening and automatically weave together products that could help my future self? At the very least a personalized search engine could be crafted. But is a memex for my personal podscape in reach?
Overcast has an OPML export feature, including a rich version with episode data. A prototyping foundation seems apparent. Jonathan LaCour cooked up a Python script to automate the download.
I previously link parked DuckDB as an embedded, high speed, OLAP engine. Gerard Bentley has some nice examples of using DuckDB, Arrow, and Python. Even though the combo isn’t definitively better than Pandas and Arrow, I like how seamless the integration between DuckDB and Arrow data appears. Simply referencing a Python variable from a DuckDB query is a neat trick. Wonder how that’s implemented under the covers.
Related DuckDB quacks Arrow: A zero-copy data integration between Apache Arrow and DuckDB
One of the reasons I’ve become interested in deep analytical dives of podcasts is because of Software Engineering (SE) Radio. It’s not quite at the top of my podcast fave lists because I don’t feel an urge to listen to it regularly. However, the content is typically excellent and I like the no nonense interview style. A little topic introduction, a little guest bio, and then right into the technical discussion. No ads either.
The analytic draw is the historical depth of the content. SE Radio launched back in 2006, 16 years ago. There are 511 episodes as of this writing. It’s provided under a Creative Commons 2.5 License. There are so many summarization possibilities to apply and questions to be answered of such a corpus. Think of all the tech topics that have come up over the years. Not to mention that there are some exceedingly distinguished guests making appearances. How have the themes and personalities ebbed and flowed over the years? What programming languages have ascended and waned? Are there any enduring repeat offenders? How does this all connect and intersect with the wider tech topic podcast ecosystem?
And that’s just a single long lived podcast! Between The Python Podcast.__Init__ (363) and the Data Engineering Podcast (288), Tobias Macey has delivered 651 episodes, primarily by himself. For almost any vertical topic you can imagine, there are probably at least a decent handful of well rooted podcasts worth deep diving on.
I’ll need to get a good lawyer to interpret the license, but the only show stopper might be the legality of building publicly available systems that derive from analytics of the podcast episodes. On the one hand, I feel like this might be like podcast search indexing which would seem to not be a derivative work. On the other, you are building on the content sooo…
Now whether all of the episodes can actually be collected is a technical challenge for another day.
Link parkin’. A couple of other static site generation engines:
I’m probably hobbling Pelican by running page generation on a 2008 White plastic MacBook (!!), but it feels pokey to me. And it always seems to regen from all my source pages, including a ton that haven’t changed in years. Whatever algorithm is being used for change detection doesn’t seem to be working. Now lining up alternatives to explore.
Keeping with the theme of generative art, previously I linked to an old post of mine where, I had replicated another artist’s work. The inspiration was Jared S. Tarbell’s piece from complexifcation.net entitled substrate. Curious, I went to the Web to see what had become of Mr. Tarbell.
Complexification.net is still alive, although frozen in time, featuring Java Applets from an era gone by. What do you expect from 2004-05 though. At least it wasn’t all Flash.
But I also chanced upon this somewhat recent interview (late August 2020) with Tarbell. First off, if I knew Tarbell had started Etsy I had certainly forgotten. That’s a nice feather in the cap, presumably leaving some financial runway to pursue whatever. Second, his backstory was quite interesting. Third, he seems to still be practicing!
What a great gallery. Fuel for replication, emulation, and inspiration.
Another activity I’d like to get back into is generative art. I’ve always admired the processing ecosystem and community. There was even an extended period where I was a quite serious dabbler in using Python to replicate various works. In particular, Matthew Pearson, a.k.a zenbullets, released a book entitled Generative Art and published a site Abandoned Art. I recently decided to see what Pearson was up to these days. Apparently he’s abandoned working in this arena, but there are still vestiges of his work online.
Having squatted on the domain adoptedart.org (currently redirects to mpr.crossjam.net) intending to recreate the works in Abandoned Art, I’ve got some Internet real estate to build a portfolio in this space. Just need to get stuck into it.
Interestingly, only screen captures of the abandoned works are available from Pearson’s current site. With a little digging in on the Wayback Machine, maybe the source code can be recovered.
Link parkin’: DuckDB
Does for OLAP what sqlite did for OLTP. Excellent podcast episode on DuckDB with Hannes Mühleisen.
At least as of May 2022, here is a brief list of podcasts that I regularly enjoy. I’ll likely comeback and do individual deeper dives through the course of the month:
Tech heavy. Yup, I’m a nerd.
Special shout out to Tobias Macey longtime producer of both the “The Python Podcast” and the “Data Engineering Podcast” while maintaining a full time day job and being a dad. Yikes!
Additional shout out to Richard Deitsch’s podcast, recently discovered, for super high quality journalism about journalism and media. Usually I don’t enjoy going meta, but Deitsch is so good he’s worth a regular listen. Also check out his work at The Athletic. The guy is a beast.
Link parkin’: MeiliSearch
Might be useful for searching blog archives, especially in a
comparison/bakeoff with sqlite’s FTS3, FTS4, and FTS5.
Well, so much for that. An unfortunate series of events busted my “streak” after only three days. Back on the horse though.
What I would have said yesterday, as the month rolls on I expect podcast and podcast analytics to be a theme. There’s a metric crapton of episodes out there and some of it is freely licensed. Listen Notes is doing some admirable stuff on basic podcast search, but I’ve got some wacky ideas around demos, tools, and prototypes that do deep dives on collected content. Hacking around on data engineering, app development, and natural language processing should lead to some interesting portfolio results, if nothing else.
Apropos the title, Overcast went through a redesign, released a few weeks ago. The day it came out I was literally thinking, “is there a way in Overcast to find one’s started, but not completed, podcast episodes?” Voilà! Feature delivered.
For a while I gave Apple Podcasts a run, but once on the Overcast bandwagon, I’ve never looked back. Highly recommended, and well worth paying the annual subscription to support the developer, Marco Arment.
Well the following statement didn’t hold up well
Disclaimer, I don’t listen to podcasts and I’m only peripherally interested in them as another form of sociable media on the Web. This post was inspired by some observations by Ernest Miller on the scannability of podcasts.
I’m actually heavily into podcasts now (four plus years with Overcast? Wow!), but digging through my ancient archives, I came across that one on PodBytes. A heck of a lot has changed in the podcast ecosystem, but I’m sort of interested to know if the ability to do a “pull quote” has actually been enabled.
In an homage to days gone by, let’s see if I can run the table posting every day for May 2022.
I actually had a good time last week fooling around with the Python based xonsh shell. Way too much time was spent getting the conch emoji (🐚) into the shell prompt but most of the blame can be laid at the feet of the powerline2 plugin. Hooking into the prompt generation isn’t obvious unless you’re already familiar with the powerline model which has been transported from the vi world. Not my jam.
There’s definitely something enticing about having Python closely coupled with the shell CLI. I need to get it installed on a few more machines and used more frequently to have a real assessment. I’m waiting to see where the real warts are but so far, so good.
Four completed in the shortest month, February, is nothing to sneeze at. Seven total for the year. Thoughts after the break.
Arguably a light haul for January, but still off to a better start than last year at this time.
Thoughts below the fold…
For some reason, this post was scheduled for June of 2020 (!!) but never posted. Letting it loose now to clean out the drafts folder.
For the first time ever intentionally, I rethemed one of my blogs. New Media Hack got a new theme only because I had to apply something during the transfer to WordPress. Mass Programming Resistance hadn’t had a style change from its inception. Now it does.
The theme is pretty minimal. I forked a repository from Giulio Fidente that is spartan black text on white background. Mainly I just decided to go with square instead of rounded corners on the buttons and to tighten up the gutter between the sidebar and the main content.
Along with fixing up the RSS and Atom feeds, the theme change was the last thing to finish before declaring the transition to static publishing complete.
Yet another left behind in the drafts bin from June 2020. Including today for posterity’s sake.
This is my first post using Markdown and Pelican. Currently https://crossjam.net/ is resolving properly and the content has transferred over. There’s a bunch of redirects, which are a little ugly, but they serve the purpose of dealing with the old WordPress path prefixes. Mass Programming Resistance and New Media Hack (still preserved, bless its heart) are separated out into two different domains.
Now need to work on a good workflow for publishing. The upside is that I can now go with Emacs to create posts, which will accelerate my ability to spit out text. But I’ll need some key bindings to smooth out some of the linking work. Also, need to figure out a comfortable flow to update the production server. Pushing to a git repo seems like a little too much work, although netlify has an interesting take on that path. Maybe for a dev site just to get some experience.
© 2008-2025 C. Ross Jam.
Licensed under CC
BY-NC-SA
4.0 Built using Pelican. Theme based upon Giulio
Fidente’s original svbhack,
and slightly
modified
by crossjam.