home ¦ Archives ¦ Atom ¦ RSS

Steampipe `net_http_request`

Link parkin’: Why build an HTTP client into a database? So you can ingest web data directly!

Steampipe tables can do all sorts of surprising things! Did you know, for example, that the Net plugin’s net_http_request encapsulates a full-blown HTTP client?

I don’t think it’s a “trend” but noting that in my feeds, exploiting an HTTP client mechanism inside of an SQL engine is popping up often.

Great piece by Jon Udell via his own blog.

Compact Speech Recognition

Link parkin’: whisper.cpp

High-performance inference of OpenAI’s Whisper automatic speech recognition (ASR) model:

Having such a lightweight implementation of the model allows to easily integrate it in different platforms and applications. As an example, here is a video of running the model on an iPhone 13 device - fully offline, on-device:

Really compact C++ version of a production speech-to-text model. If I can get it to build, I’ll try it against some podcasts to see how things come out. If halfway decent it could become a piece of a comprehensive personal knowledge extraction memex.

Chiang and Borges

Having just completed a collection of Borges short stories, The Aleph and Other Stories, I recently had a lightbulb moment. The brilliance of Ted Chiang’s short story Exhalation (thank you Wayback Machine) partially derives from channeling Borges so much. Deeply philosophical, first person narrative, brutally compact, with a mind bending twist at the end.

Both authors highly recommended.

Newsletter Knowledge

Speaking of newsletters, a while ago I was trying to recover a web page on password handling in Python (Secure Password Handling in Python) that I had just left in a browser tab. I’ve got a bunch of Chrome profiles, always have multiple Chrome/Firefox/Safari windows open, each with multiple (multiple) tabs open. Due to a couple of reboots and default age limits on browser history it was looking grim on finding that link again. While useful, the page wasn’t search engine optimized enough to come in at the top of Google with the obvious keywords.


Then I remembered which Chrome profile likely had the tab. Popped over to the window and pulled up GMail for that account. Hit the GMail search bar with “credentials” and there it is, right in the PyCoder’s Weekly newsletter.

All to say, over time a collection of link newsletters actually represents a significant knowledge base of curated references. The links have been vetted by humans for collection. For many of them, there’s an authored text summary nearby. There’s a timestamp attached, along with other metadata.

What if you grabbed those newsletters out of mailboxes and shoved them into a text and/or semantic search engine? Possibly another component of a personal memex.

TIL Cathect

TIL a new word, cathect

(transitive, psychology) To focus ones emotional energies on someone
or something.

    * 2013, Carroll E. Izard, Human Emotions, page 193:

        Apparently it is possible for an individual to cathect any
        person, object, idea, or image. Of considerable importance to
        a possible analogy between cathexis and the emotion of
        interest, is Freuds notion that an individual can cathect
        thought or thinking as well as attention and perception.

Love it when my vocabulary is expanded.

Courtesy of Masha Gessen in Surviving Autocracy.

Newsletters and Linkblogging

I subscribe to a decent number of newsletters, like Python Weekly, that are basically curated collections of links. I want to do more linkblogging and these are great resources but for the following problems:

  • There’s not an easy way to pry out and blockquote text for individual links
  • The embedded links flow through tracking redirectors, so it’s hard to know the ultimate destination
  • Linkpost creation needs an easy way to do some type of via attribution

Sounds like time to get on the hacking bandwagon. One potentially useful hack is that I’m pushing a number of newsletters into a Feedbin mail address and reading them there. I wonder what they look like when grabbed out of the Feedbin api?

2022 Books Completed, Part 5

Thirty-six works completed for the year, by my current math. If I don’t completely flub my holiday time, I’ll notch forty this annum.

read more ...

A pyproject.toml Tutorial

In starting a new Python module for mucking around with the Feedbin API, I wanted to modernize a little and update to the latest Python practices for packaging. In particular, I’d been hearing a bit about pyproject.toml as the way to configure Python build tools. Rogier van der Greer has an excellent overview:

About three years ago I wrote a blog post about using setup.py to set up your python projects. Since then a lot has changed, mostly due to PEP 517, PEP 518 and the introduction of the pyproject.toml file.

The goal of this file is to allow you to define what build tools are needed in order to build your package – no longer assuming it must be Setuptools. This makes it easier to use alternatives to Setuptools, which means that Setuptools does no longer have to be the one build tool that can do everything.

Nevertheless, I like the feature set of Setuptools, and would like to continue to use it in the foreseeable future. But while documentation for using for example Poetry or Flit together with a pyproject.toml is easy to find, it is more difficult to find similar documentation for Setuptools. So let me help you out.


Link parkin’: sqlite-http

sqlite-http is a new SQLite extension that allows you to create HTTP requests in SQLite.

Think of fetch(), requests.get(), or curl, but entirely in SQL!

Also really enjoy how Alex Garcia created a blog post out of a notebook. Looks like he’s building up quite the collection of SQLite extensions.

C. f. Steampipe

Serial Project Hoarding

Wisdom from Simon Willison. Coping strategies for the serial project hoarder

This is the most important tip: avoid side projects with user accounts.

If you build something that people can sign into, that’s not a side-project, it’s an unpaid job. It’s a very big responsibility, avoid at all costs!

Almost all of my projects right now are open source things that people can run on their own machines, because that’s about as far away from user accounts as I can get.

I still have a responsibility for shipping security updates and things like that, but at least I’m not holding onto other people’s data for them.

Let me repeat for emphasis If you build something that people can sign into, that’s not a side-project, it’s an unpaid job.

Most definitely.

Tailscale First Impressions

Tailscale is a VPN for the rest of us:

A frustratingly simple VPN

Tailscale lets you easily manage access to private resources, quickly SSH into devices on your network, and work securely from anywhere in the world.

Oddly, while I probably could have put Tailscale to gainful use for quite some time, instead I just admired the technical depth of their blog posts. There’s been a lot of hype, criticism, discussion, etc. but the product seems to be a bit of a hit.

With a few spare cycles I finally decided to bite the bullet and give Tailscale a shot. In the last 24 hours, I’ve installed it on six machines, a mix of Linux, macOS, and iOS. Three are desktops/servers in my home, one’s a relatively stationary laptop, another’s a cloud node, and the last is an iPad. All easy, peasy 5-10 minute installs. The UX is to die for.

Haven’t attempted to use while away from homebase which is the next round of testing. So far, so good.

Once upon a time I noodled around with ZeroTier, which still seems to be chugging along. At that moment, the UX for ZeroTier was nowhere as smooth as Tailscale now, especially the outsourced identity management. I expect ZeroTier has sanded off a lot of the rough edges. The Tailscale folks even consider ZeroTier an admirable competitor.


Link parkin’: requests-cache

requests-cache is a transparent, persistent cache that provides an easy way to get better performance with the python requests library.

Necessary for messing around with the Feedbin api.

Dammit Doctorow!

So I just recently mentioned how I’m trying to fight off some media FOMO.

And then Cory Doctorow posts about all the books he reviewed in 2022.

Every year around this time, I round up all the books I reviewed in the previous 12 months; both as a convenience for readers and to remind myself that I don’t need to feel quite so horribly guilty about all the books I didn’t review (to those authors: rest assured, I still feel horribly guilty).

I’m already in the hole four titles, one completed, in hand from this list (Chokepoint Capitalism, A Half-Built Garden, The Persuaders, and finished Survival of the Richest ). Now I’m getting excellent blurbs on further material for the library, from a favorite blogger. Definitely eyeing a handful of the nonfiction recommendations and it may be time for me to come back around to Neal Stephenson.

I’m going to swipe from another of my faves, Tim Bray, in a post worthy of discussion here, and say Protect Me From What I Want!!.


TIL jsonpath-ng

A final implementation of JSONPath for Python that aims to be standard compliant, including arithmetic and binary comparison operators, as defined in the original JSONPath proposal. …

This library differs from other JSONPath implementations in that it is a full language implementation, meaning the JSONPath expressions are first class objects, easy to analyze, transform, parse, print, and extend.

And a nice, quick tutorial to go with it.

Mentioning Httpie

Link parkin’ HTTPie

Don’t know how it’s possible, but I haven’t mentioned this incredibly useful, Python implemented, CLI for interacting with HTTP(S) endpoints. Sort of a friendlier curl, but with a lot less muscle memory for me.

PSF Supporting Member

Here in the states we celebrate a Thursday holiday with food and entertainment gluttony, followed by a Friday of manufactured physical retail (traditionally) consumer gluttony, followed the next Monday with manufactured online consumer gluttony.

And then we try to cleanse the palate with “Giving Tuesday”.

I’ve pretty much trended to flipping the script on those days, although I do love a bit of food gluttony. This year I achieved nothing purchased on Black Friday and Cyber Monday, nor even over the weekend which used to be my consumer gluttony escape hatch. On Giving Tuesday I batched up a whole bunch of my annual giving which usually gets done the last week of December.

Not going to go into all of my charitable contributions, but just wanted to plug supporting membership in the Python Software Foundation. I’ve gotten a ton out of Python over my career and I really enjoy the programming language, so it’s only right I give back a little. I have no idea what benefits accrue from supporting membership but I’m doing my bit to publicize their availability and encourage other Pythonistas to join up.

Pi-hole Bliss

Some folks at work were waxing ecstatic about how a Pi-hole made such a huge difference in ad blocking. I thought installation would be a little bit more effort than value for my time. Then I found a small window of free time this weekend and thought “what the heck, let’s give it a shot.”

Commonly affiliated with Raspberry Pi devices, Pi-hole is actually quite easy to deploy on an Ubuntu device. After less than an hours worth of work, I had the whole stack up and running. And boy does it make a difference! The screen cap below highlights the amount of ad blockage for just a couple of hours and a few devices.

Pi-hole Dashboard Screen Capture

Sorry I waited so long!

Fear of Missing Out 2022

Back in October I had this weird uncharacteristic media buying splurge. Physical or digital it didn’t matter. Novels, novellas, short story collections, graphic novels, pamphlets, didn’t matter. Sports, history, culture, politics, media, tech, didn’t matter. Training sites, news apps, digital long form magazines, didn’t matter.

Some of it was authors I like releasing new books that seem really intriguing. Some of it was recommendations from authors I really like. Some are from authors I know personally. Some of it was second order linkage from references. Some of it was deal sniping and some of it was remainder bin diving. Some of it was stuff I want to revisit. Some of it was material I always thought I should visit.

At this point, I have so much material I’d be totally happy spending every waking hour for the rest of the calendar year reading as much as possible.

Thankfully I avoided any Black Friday urges and didn’t purchase anything. Looking forward to Giving Tuesday.

The Twitter Kerfuffle

kerfuffle | kərˈfəf(ə)l |

noun [in singular] British informal

 a commotion or fuss, especially one caused by conflicting views: there
 was a kerfuffle over the chairmanship.

I abdicated Twitter well over three years ago, but with the recent events regarding the company changing ownership I decided to check in and see how long I’d actually been on the service:

Screenshot of @crossjam Twitter account. Joined February

According to Wikipedia, that “Joined February 2007” means my account was established within the first year of the service and 7 months of its initial public launch 😮.

Abdicating was driven by pursuit of “deep work” and avoidance of the cognitive and emotional drain the site represents. I know there are some vibrant, useful, and potentially interesting to me community pockets within Twitter. Good on them, but I haven’t missed it for a moment. There is a somewhat juvenile impulse to jump back on for a one tweet dunk on the order of “Look at me! I bailed before all the shit really hit the fan. What are ya’ll still doing here?!” but I like to think I’ve gotten more mature with age.

There was about 30 seconds where I thought it might be cool to join the fediverse and see where this all might eventually land, but I quickly came to my senses. If I didn’t need what was on offer before why would I need it now?

This adventure was prompted by an article entitled “What happens to sports media if Twitter dies?”. It will likely be my last commentary about Twitter for quite some time.

TIL Gradient Accounts

TIL Twitter Gradient Accounts

Other people on Twitter had noticed them as well and referred to them (usually with irritation) as “gradient accounts,” because many of their profile pictures are not of human faces or anything else, just color gradients. Gradient accounts have usernames that sound like AIM usernames: @f41ryluvrr, @urf41ryg1rl, @moonlouvrr, @newmoonbaby2, @glitteryxhearts. Through their tweets, they identify as overthinkers and dreamers and hot people, and they often profess melancholy and romantic longing. The romantic longing sometimes clashes with casual misanthropy; the all-lowercase disclosures of trauma and malaise are mixed with playful Gossip Girl memes. Their content is more popular than I can possibly explain, and they know it.

Was totally media nerd sniped by this, with thoughts of the Marly Krushkova arc from Count Zero running through my head. Somehow real people participating in a mass consensual hallucination makes it even more Gibsonian.

DJ Tech

As in technology for hip-hop DJs. Scratch Cyborgs: The Hip-Hop DJ as Technology may be the best thing I’ve ever run across on Hacker News.

Hip-hop DJ culture provides a rich site for exploring how culture and industry can converge and collaborate, as well as how they need each other to move forward.

The references are just a rabbit hole for this technologist, hip-hop lover, and former amateur club DJ.

And as an MIT alum, how did I not know about the MIT Press Reader? 😧

Substrate XScreensaver

Two great tastes that taste great together. Jamie Zawinski’s XScreensaver for macOS and Jared Tarbell’s substrate generative art algorithm. Stunningly beautiful on a 27” iMac.

Seeing as how I’ve actually used XScreensaver on 1990s vintage UNIX workstations, and also mostly reimplemented substrate for my own ends, I’m embarrassed to learn substrate has been in XScreensaver since 2004. I’ll blame it on being distracted by the XMatrix digital rain plugin. That’s the one I default to using via XScreensaver. Love the little Matrix Reloaded “Easter egg”. The randomizer feature is looking better though, now that you can make useful subselections of the available screensavers.

2022 Books Completed, Part 4

It’s been a minute, or two, since we’ve done this. From May to August 15 titles were closed out. That’ll take this year’s total to 25. 40 finished is conceivable.

Lots to get to. Let’s get stuck in.

2022-10-04 Now with even more commentary.

read more ...

The Peripheral on Netflix

Like many of his other novels, William Gibson’s The Peripheral grew on me after first reading. In late October, Amazon Prime will be airing a TV series adaptation. The trailer looks good, but then again the trailer always looks good.

read more ...

Streak Busted

Yesterday I busted my 434 day music listening streak, dangit! 😢

I hung my listening habit off of my exercise habit, but had a little change up yesterday. Went for a walk in the morning instead of the evening and rushed out the door to meet the daily schedule. Then never got around to picking out a tracklist and firing it up.

Ah well. We go again.

Said the Manchester City fan, heh.

The Million Playlist Dataset

Link parkin’: Spotify Million Playlist Dataset Challenge

The Spotify Million Playlist Dataset Challenge consists of a dataset and evaluation to enable research in music recommendations. It is a continuation of the RecSys Challenge 2018, which ran from January to July 2018. The dataset contains 1,000,000 playlists, including playlist titles and track titles, created by users on the Spotify platform between January 2010 and October 2017. The evaluation task is automatic playlist continuation: given a seed playlist title and/or initial set of tracks in a playlist, to predict the subsequent tracks in that playlist. This is an open-ended challenge intended to encourage research in music recommendations, and no prizes will be awarded (other than bragging rights).

Mmmmm… data. Unfortunately, terms of service limit usage to participation in the challenge contest. So no data munging and redistributing.

Podcasting 2.0

TIL: Podcast Index

The Podcast Index is here to preserve, protect and extend the open, independent podcasting ecosystem.

We do this by enabling developers to have access to an open, categorized index that will always be available for free, for any use.

Has an API. Interestingly, the entire podcast database (guessing not full content of enclosures isn’t included) can be downloaded. Mmmmmm… data.

The team behind the Podcast Index is leading a Podcasting 2.0 “movement” to add some features to RSS to improve the podcast experience for publishers and listeners.

Podcasting 2.0 is a set of forward looking ideas combined with the technology to realize them. It’s a vision for what the podcast listener experience can and should be. That experience has stagnated for over a decade, with almost all of the improvements coming in isolated sections of the ecosystem. There hasn’t been a single, unified vision from the podcasting community acting together with one voice. So, we’ve ended up with fragments of innovation across the podcasting landscape with no central driving goal in mind. Podcasting 2.0 is the expression of what that goal could be.

Bonus project: Sucky pulls down an RSS feed and all of its enclosures.

Hat tip to a great Changelog podcast episode, natch, discussing RSS with Ben Ubois, the creator of one of my favorite services feedbin.

Python CLI Authentication

My default starting point for building a new piece of software is to create a command line interface (CLI) app. I’ve even got my own personalized Python cookiecutter template to generate them quickly. Independent of some personal preferences that this supports, like options for logging, thanks to the click argument parsing toollkit, the autogenerated tool immediately integrates well with the wider UNIX ecosystem.

Buuuut, I’ve never gotten a good handle on how to integrate this approach with the modern Web API OAuth+API token methodology. OAuth gives me a headache 😆.

I’m still digesting this article from the folks at notia.ai, entitled “Building an authenticated Python CLI,” but on first read it seems really well done.

When building out the Notia client, we found a real lack of resources around building a persistently authenticated Python library.

To address this, we are going to be building an interactive, authenticated Python CLI that uses the Twitter API to fetch the top Machine Learning tweets of the week! You can see the final result in the video demo above - or you can skip to the final code here.

Building this CLI will let us explore concepts like authenticating a local device between uses, accepting CLI arguments with Click, and displaying our data interactively with Rich.

I’ll definitely be taking their advice in building some future X-to-sqlite applications.

Due credit to Simon Willison for providing the upstream basis of my cookiecutter template.

stfnal, Word of the Day

Chanced across a completely new word to me: stfnal

stfnal (comparative more stfnal, superlative most stfnal)

Of or pertaining to scientifiction or science fiction.

Via a Cory Doctorow review of A Half-Built Garden, which I am running out and buying tomorrow in hardcover.

Xonsh, history, sqlite

xonsh has been growing on me as an interactive shell. One area I haven’t delved into much is the history capabilities. I guess I shouldn’t be too surprised that sqlite makes an appearance:

Xonsh has a second built-in history backend powered by sqlite (other than the JSON version mentioned all above in this tutorial). It shares the same functionality as the JSON version in most ways, except it currently doesn’t support the history diff action and does not store the output of commands, as the json-backend does. E.g. xonsh.history[-1].out will always be None.

The Sqlite history backend can provide a speed advantage in loading history into a just-started xonsh session. The JSON history backend may need to read potentially thousands of json files and the sqlite backend only reads one. Note that this does not affect startup time, but the amount of time before all history is available for searching.

Combine with the sqlite’s full text search capabilities for even more entertainment.

Welcome Back Data Machina!

Data Machina is a weekly link newsletter on AI/ML topics broadly construed. It’s distinctive in the variety of areas it touches on ( e.g. specific language sections for Python, Scala, Lisp/Clojure, R, plus segments on datasets and distributed systems), along with the sheer number of links. I was enjoying the content way back when it started out on TinyLetter, but it went on hiatus and thence behind a paywall, so effectively disappeared from my radar.

Why subscribe to Data Machina?

Data Machina brings you a highly curated selection of the best in Machine Learning, AI, Data Science, and Data Engineering every week, 52 weeks per year.

Loaded with useful, unique, and interesting content, Data Machina is read by thousands of AI/ML professionals and researchers around the world.

Data Machina is published in a minimalistic, easy-to-read format, with pure, simple text, and structured in clearly marked sections so you can scan them quickly without being disturbed by ads, banners, icons, images or other annoying stuff.

Now it’s back in a free version and looking as good as ever. Glad to make your acquaintance again!

Also, yet another plug for feedreading email newsletters. The latest versions of Data Machina quietly popped up in my feeds unannounced. No muss, no fuss. Just back to reading the high quality linkfest in its best habitat for me.

Mopidy Mystery

I was hoping to fool around with Mopidy as an audio playback engine because it’s written in Python and supports the MPD protocol according to the documentation. When I went to install it using homebrew on my MacBook Air, the latest version had problems with its plugins, wherein I discovered there was already an outstanding issue on GitHub. Unfortunately, a solution didn’t look promising, but at least I chimed in my interest.

So off I go, working on other things and forgetting about the problem. Lo and behold, another user reports the real source of the issue and a convenient fix. With an export GST_PLUGIN_PATH=/opt/homebrew/lib/gstreamer-1.0/ now my Mopidy server works perfectly and can playback audio on my MacBook. Score one for just registering interest on GitHub!!

Mystery solved. Onwards to implementing my own database driven, dynamically created playlists.


TIL Steampipe. From the intro announcement:

Steampipe, a new open source project from Turbot, enables cloud pros (e.g. software developers, operations engineers and security teams) to query their favorite cloud services with SQL. It has quickly become one of our favorite tools in-house and we hope it finds a way into your tool box as well.

The heart of Steampipe is an intuitive command line interface (CLI) that solves the challenges encountered when asking questions of cloud resources and services. Traditional tools and custom scripts that provide visibility into these services are cumbersome, inconsistent across providers and painful to maintain. Steampipe provides a consistent, explorable and interactive approach across IaaS, PaaS and SaaS services.

Via an O’Reilly Radar Post by Jon Udell. Glad to see him back!!


TIL pgloader

pgloader has two modes of operation. It can either load data from files, such as CSV or Fixed-File Format; or migrate a whole database to PostgreSQL.

pgloader supports several RDBMS solutions as a migration source, and fetches information from the catalog tables over a connection to then create an equivalent schema in PostgreSQL. This means that you can migrate to PostgreSQL in a single command-line!

Via a Twilio blog post linked from the PyCoders Weekly newsletter, Issue 533

Actually Mutagen

Actually, mutagen is probably the right tool for the MP4 metadata job at hand, especially with the EasyMP4 class available

Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey’s Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg Theora, Ogg Vorbis, True Audio, WavPack, OptimFROG, and AIFF audio files. All versions of ID3v2 are supported, and all standard ID3v2.4 frames are parsed. It can read Xing headers to accurately calculate the bitrate and length of MP3s. ID3 and APEv2 tags can be edited regardless of audio format. It can also manipulate Ogg streams on an individual packet/page level.


Link parkin’: mp4v2

A C/C++ library to create, modify and read MP4 files

This is the new MP4v2 project, a fork of the abandoned MP4v2 library project now archived at Google Code.

Seems a little more convenient, vice ffmpeg for working with .m4a files as converted for or ripped by Apple’s Music.app. While primarily a library, there are a few cli tools such mp4info and mp4tags.


Per usual Simon Willison has pushed out yet another impressive SQLite oriented tool: sqlite-comprehend

I built a new tool this week: sqlite-comprehend, which passes text from a SQLite database through the AWS Comprehend entity extraction service and stores the returned entities.

My attention was caught by multiple aspects:

  • The usage of many pieces of his toolkit but especially db-to-sqlite to grab data out of PostgreSQL, since I have some interesting data in guess what … PostgreSQL
  • Outsourcing entity extraction out to AWS Comprehend
  • The application of SQLite’s full text search capabilities
  • And of course Simon’s way of writing this all up, which I aspire to emulate. I’m getting there with potential content.

Bottom line, I think it’s eminently possible to take my Discogs tables and Fabric views, export them into a single SQLite / Datasette instance, and have an easily searchable Discogs artifact that’s simple to distribute as one SQLite file on a CDN.

Don’t know if anyone else would use it, but it’s an itch I’d like to scratch.

TIL Bytewax

TIL about Bytewax

Bytewax is an open source Python framework for building highly scalable dataflows in a streaming or batch context.

read more ...

Discogs View Cleanup

This week I learned about PostgreSQL’s conditional expressions in general and the COALESCE expression in particular. A big part of the grungingess of my Discogs Postgres views is dealing with the data’s usage of alternative name variations or anvs in the fabric_track_artists view which are quite often NULL. This propagates into a crappy ad hoc value for the track_artists via abuse of concat_ws. I’ve got a pretty good feeling that can be handled more elegantly with a COALESCE.

A couple of other things that need investigating:

  1. The regexps for fabric vs fabriclive should be collapsed into one
  2. Rename the fabric_live column to a more general fabric_series and compute it from the title column
  3. Reexamine the UNION statements to see if they can be handled by a more appropriate join

Lots of redundancy that can be cleaned up.

NetNewsWire SQLite Schema

Just for giggles, following up on my pondering regarding the SQLite schema within NetNewsWire, I poked around in the DB and pulled the schemas:

read more ...

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.