home ¦ Archives ¦ Atom ¦ RSS

TIL Inspect AI

TIL Inspect

An open-source framework for large language model evaluations

Looks like a nice piece of open source kit from the UK government’s AI Security Institute

A big part of the day job is LLM evaluation so this is definitely of interest.


soco-scribbler

I forked sonos-lastfm just to noodle around with Sonos monitoring generally. Dubbed the project soco-scribbler as it’ll scribble down locally what’s been played for later usage. Goal one is to record plays to sqlite via sqlite-utils. Then onwards to reconciliation with last.fm assuming there’s a parallel scrobbler going, like the hit or miss Sonos plugin. Afterwards, who knows.

Forza.


Gemini Chat Link Longevity

Since I just put the chat link feature of Gemini, thought it might be a good idea to ask how long the links last. And who better to query then Gemini!

Public links to Gemini chats generally stay alive as long as the associated chat is saved in your Gemini Apps Activity. …

In summary, the lifespan of a public Gemini chat link is tied to how long you keep the original chat in your Gemini Apps Activity, unless you choose to delete the link specifically.

If I’d have figured this out earlier, I’d have used a Gemini link in my discussion of mixes and streaming platforms. Which reminds me to check in and see how to do this with ChatGPT and Claude.


Beats In Space

Speaking of Beats In Space, this is a trove of mix history that I need to dive into:

Beats In Space is a radio show that started in 1999 on WNYU 89.1FM in New York City and is now broadcasting every week on Apple Music.

Beats In Space is brought to you by New York DJ Tim Sweeney.

Not only does Sweeney have a guest mix, often he also does a mix, and an interview with the guest dj:

I’m excited to share that all Beats In Space interviews are now available for FREE on Apple Podcasts! Whether you’re a long-time listener or just discovering the show, you can now dive into in-depth conversations with some of the most influential DJs and producers in electronic music—no paywall, just pure music talk. Subscribe and leave us a comment!

Great backgrounder at the fifteen year mark on Sweeney and the show. Now currently at twenty five plus years and counting 😮, with a only a single hiatus!

Previously I have spoken about the notion of a retrocast for spoken word podcasts, but the same could be done for music podcasts / mix series, tracing their evolution over time. I chatted up Gemini about this during some background research on Carlita and asked “Are there any such tools that look at longitudinal track trends, DJ relations and history, and/or track genealogy”. There’s a whole bunch more to the thread but this kicks off a lot of intellectual provocation.

You’re asking about a fascinating and cutting-edge area of AI in music, moving beyond simple track identification to deeper, more interconnected analysis. While the precise, all-in-one tools for “longitudinal track trends, DJ relations and history, and/or track genealogy” aren’t yet widely available as off-the-shelf consumer products, the underlying AI and research are certainly advancing in these directions.

Mixes, episodes, data integration and AI 🤔. Feels like a great rabbit hole to get stuck into.


The Platform for DJ Mixes

I’ve recently fallen down a rabbit hole of adding DJ mix sets to my Apple Music library. Boiler Room and Fabric have been on the radar now for a year or two, with solid presences in the Apple Music service. For whatever reason, one day recently I started chasing the “More By …” and “You Might Also Like…” connections underneath a set’s tracklist and discovered:

  • The Warehouse Project and
  • Glitterbox and
  • Tomorrowland and
  • Defected Broadcasting and
  • The Lost Village and
  • Movement and
  • you get the picture …

The nice thing is that I’m coming across new to me DJs like Carlita, TSHA, Seth Troxler, Skream, Chase & Status, Sonny Fodera, Denis Sulta, and Kilimanjaro. Meantime there’s a heaping helping of old friends with mix sets I’ve never heard before like Josh Wink, Todd Terry, MK, Roger Sanchez, Armand van Helden, Lil’ Louie Vega, Masters at Work, Marshall Jefferson and Basement Jaxx.

Some of these sets are bananas in terms of length, including a 5 hour ride with Seth Troxler B2B Skream.

Out of curiosity I asked the major AI bots their opinion to compare and contrast Spotify with Apple Music on this front. Surely Spotify must be in the arena here? Apparently that’s not really the case. Here’s what Claude had to say:

Bottom Line

Apple Music is the clear winner for electronic DJ mixes, offering:

  • Professional-grade content: Thousands of expertly crafted continuous DJ mixes
  • Industry partnerships: Direct integration with major DJ software and hardware
  • Regular updates: Monthly curated mixes and exclusive content from top DJs
  • Variety: Comprehensive coverage of electronic music genres

Spotify falls short with mainly user-generated playlist collections rather than true DJ mixes, limited professional content, and no current support for DJ software integration.

For serious electronic music fans and DJs, Apple Music provides a significantly richer and more varied selection of actual DJ mixes compared to Spotify’s more basic offerings.

I need to do some further work to verify they’re all not hallucinating but this matches my general vibe. Full disclosure, I ditched a Spotify premium subscription a few years back since I got sucked into the Apple One bundle.

What really grabs me about Apple Music is the relation with curators, promoters, and labels such as The Warehouse Project, Boiler Room, Fabric, and Defected. Plus they have their own series Beats in Space and provide dj mix themed genres, although you have to dig a little. This provides some serendipitous discovery without needing to go completely algorithmic. Great folks like Boiler Room just putting up phenomenal events on Apple Music and the archives live on in perpetuity for someone like me to fall into.

Have to acknowledge the creator focused platforms Mixcloud and SoundCloud. They’re both great sources for sets directly from DJs. Bonus, they integrate nicely with the Sonos platform. They just don’t provide personal library of tracks that Apple Music does. He said grudgingly.

Now if only Apple would do something about their janky desktop app.


20 Years on Last.fm ?!

Happened to be looking at my profile on Last.fm and noticed the following

Screenshot of Last.fm profile - Joined June 5,
2005

In case you can’t make out the text from the screen capture, there’s a “scrobbling since June 5 2005” hiding in there.

Hang on! I’ve been scrobbling for 20 years 😱 ?!

Just for calibration, that’s one year longer than The Site Formerly Known as Twitter has existed, and one year less than TheFacebook.

Some good ’ole Web 2.0 services just keep hanging in there.


Searching Mass Programming Resistance

Boy am I a dope.

I’ve been fiending for search capabilities on this here ’blog for ages. Never got around to putting something together dreading grinding through HTML and JavaScript coding. So I just limped along with site:crossjam.net modifiers in search engines.

About this time last year, I posted about pagefind. pagefind is a JavaScript toolkit for creating embedded search indexes to go along with static server pages. Looked easy peasy to run and use. Lame old me still dragged his feet.

Finally for some Friday hacking, I decided to push through and see if I could get pagefind based search working in an experimental fashion.

After reacquainting myself with the pagefind docs, I had it working in an hour 🙄. Still have a few bits to polish, but once it’s working there will be an honest to gosh search page


A New Breed of AI Engineers

Being a Systems guy, I don’t really know much about ML and AI. In my attempt to keep up, I’m trying to draft off the best that I know of. Andrew Ng is a legend, so I’m subscribed to his newsletter “The Batch”. The most recent edition, entitled Meet The New Breed of GenAI Application Engineers neatly summarizes a lot of what’s going on beyond the various echo chambers:

There’s a new breed of GenAI Application Engineers who can build more-powerful applications faster than was possible before, thanks to generative AI. Individuals who can play this role are highly sought-after by businesses, but the job description is still coming into focus. Let me describe their key skills, as well as the sorts of interview questions I use to identify them.

Skilled GenAI Application Engineers meet two primary criteria: (i) They are able to use the new AI building blocks to quickly build powerful applications. (ii) They are able to use AI assistance to carry out rapid engineering, building software systems in dramatically less time than was possible before. In addition, good product/design instincts are a significant bonus.

About two years ago, I would have been one to scoff at the notion of AI Engineers, thinking them as overblown prompt tweakers and API cobblers. I’ve come around because of two major shifts.

First, systematically, robustly, and safely building capabilities on top of LLMs really does require engineering. We’ve moved past the point of just bolting a chat window into an app and calling it a day. Compound AI systems (Go Bears!) with agentic components are a whole new ballgame.

Second, the shape and quality of system building is changing. The baseline velocity of software development is going up. Vibe coding may lead to crap applications but there are enough folks doing the required experimentation to lead to better outcomes. Still have concerns about the security, ops, and maintenance aspects but those become worthy avenues to provide value.

If you click through on Ng’s newsletter, that list of AI building blocks is daunting. Good taste on which of them to early adopt, lag, and deep dive will be another defining skill. As always, those who can discern the enduring qualities will be well positioned.


Synology and Sonos

I have a Synology NAS and a few Sonos devices. Pulling these two nuggets of information together in case someone else needs to rework their setup. Previously I had all my music data on one machine, but since the NAS has plenty of space, it made sense to migrate the Apple Music library and then point Sonos in the right direction.

First you need to create a new library Apple Music

You can have more than one music library in Music. For example, you could have a library of holiday music that wouldn’t appear in Music the rest of the year. Or you could keep some of your music in a library on an external storage device.

Then these Sonos instructions illustrate how to get your old data over to the NAS library.

Choose in Apple Music under the File menu the ‘Import’ option. Select the folder where your music files are stored on your local drive. The Apple Music app will now copy all the files in the selected folder and subfolders to the NAS and update its music library. Bear in mind that all files will be copied to the NAS but not be moved, so the initial Apple Music library and local music folders will remain without change on your local drive.

This definitely can take a while. Unfortunately, Apple Music goes through a few different phases during import, only has minimal messages, and definitely no progress reporting. Oh well.


OpenRouter

Link parkin’: OpenRouter

From the OpenRouter docs

OpenRouter provides a unified API that gives you access to hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Get started with just a few lines of code using your preferred SDK or framework.

I actually have an API key already, just need to get into some mischief with it.

Good podcast episode with the CEO of OpenRouter, Alex Atallah.


TIL NASA API

TIL, NASA has an API:

Welcome to the NASA API portal. The objective of this site is to make NASA data, including imagery, eminently accessible to application developers. This catalog focuses on broadly useful and user friendly APIs and does not hold every NASA API.

Requires obtaining an API key, which I assume is pretty straightforward. Wonder if its image search is good for querying Webb and Hubble telescope imagery.

Via the Pybites community forum


Locally Resolving .local

Link parkin’ a GitHub gist, because it leads to some fun pure local configuration and I can’t quite find anything similar in a web page or blog post.

The rub is that you can configure macOS DNS to direct resolution of domains by doing a bit of file tweaking in /etc/resolver. For example, .test by adding an entry in /etc/resolver/test that points to 127.0.0.1 for a DNS nameserver. Then if you install and configure dnsmasqn to listen for DNS requests on your machine, it can take over the resolution of a domain and point entries to services anywhere, but your machine in particular. This includes wildcarding subdomains. A tricky bit is closely reading the dnsmasq docs on the --address option and knowing you can use that in a dnsmasq.conf file.

What’s this good for? I used it when working with Dokku which allocates subdomains under a root domain for services it dynamically builds and creates. Configured to use .test as its root, all the services resolved nicely thanks to dnsmasq pointing back to my laptop. This makes local experimentation and demonstration of “push to deploy”, PaaS services easy, fun, and portable. YMMV.

Of note, there’s a new(ish) PaaS on the block that I’d like to try out: coolify


The MLX and mlx-lm Projects

Have to take a moment to recommend the MLX project.

MLX is a NumPy-like array framework designed for efficient and flexible machine learning on Apple silicon, brought to you by Apple machine learning research.

The Python API closely follows NumPy with a few exceptions. MLX also has a fully featured C++ API which closely follows the Python API.

But ultimately it’s the application of MLX in the mlx-lm CLI tool that’s really impressive. I’ve been able to use mlx-lm to run LLM model weights from Hugging Face in easy one liners. There’s a great Hugging Face MLX Community that works exceptionally hard to convert the latest open weight models into the MLX format. mlx-lm also supports using GGUF files directly which blows open the space of available models. Based on an HF overview article here’s an example:

$ uv tool install mlx-lm
$ uv tool install 'huggingface_hub[cli]'
#
# You'll need to
# 1) Accept terms and conditions for the Mistral model for your account on the HF site
# 2) Create an HF token in your account to login with from the cli
#
$ huggingface-cli login
...
$ mlx_lm.chat --model mistralai/Mistral-7B-Instruct-v0.2
... lots of downloading of model weights ...
[INFO] Starting chat session with mistralai/Mistral-7B-Instruct-v0.2.
The command list:
- 'q' to exit
- 'r' to reset the chat
- 'h' to display these commands
>> hello
Hello! How can I help you today? If you have any questions or need
assistance with something, feel free to ask. I'm here to help. If you
just want to chat or share some thoughts, that's fine too. Let me know
what's on your mind. :) 
>> Who is your maker
I don't have a maker in the traditional sense, as I am an artificial
intelligence and don't have a physical body or a creator. I was
designed and developed by a team of engineers and computer
scientists. If you have any other questions or need help with
something, just let me know! :) 
>>
>> /bye
Goodbye! It was nice chatting with you. If you have any other
questions or need assistance in the future, don't hesitate to
ask. Have a great day! :) 
>> q
$ 

I have a 3 year old 24 Gb M2 MacBook Pro so I don’t get the best tokens per second output. Still impressive how easy it is to run a decent sized LLM with little to no development needed. There’s even a handy plugin for Simon Willison’s llm tool.

Messing around with LLMs is almost cookie cutter at this point.


Yeah, What Thomas Said

It’s a sort of classic Internet, spicy hot take, grenade in the hole rant (old Usenet heads will recognize the type), but there’s a lot in Thomas Ptacek’s My AI Skeptic Friends Are All Nuts that had me nodding my head. And I’m not even an AI hipster. (Although a bit of fly.io fanboy)

Tech execs are mandating LLM adoption. That’s bad strategy. But I get where they’re coming from.

Some of the smartest people I know share a bone-deep belief that AI is a fad — the next iteration of NFT mania. I’ve been reluctant to push back on them, because, well, they’re smarter than me. But their arguments are unserious, and worth confronting. Extraordinarily talented people are doing work that LLMs already do better, out of spite.

All progress on LLMs could halt today, and LLMs would remain the 2nd most important thing to happen over the course of my career.

Ptacek has the bona fides to back this up. To be sure, his focus is on coding and software development assistance, but that’s why I nodded my head so much. Can’t claim jumping in with both feet, doing a ton of vibe coding, and deploying to prod with my co-pilot. However, my gut tells me this change is on the order of impact that compilers had on programming. In 3-5 years we’ll be going, “did we really do it that way?”.

YMMV.

P.S. When I grow up, I want to write like Thomas Ptacek


100 Days of Fabric

Maybe later this year I should do a 100 Days of Fabric blog series. The idea would be to listen at least once to every mix from my collection of the Fabric series and post about it. Stretch goal would be to run these listens consecutively as a streak. Hard but doable.

From the data side, it would be a nice side quest to auto generate the skeleton of each post from the Discogs Data Dumps. Work with AI to develop the code including the embedding and styling for track lists. Maybe throw in an LLM generated summary and related artists section.

Another side quest would be to develop an app that could serve oEmbed content for track listings from slices of the Discogs Dataset.


3 Years of Kagi

Kagi, the alternative search engine and browser developer, turned three this week.

We have come a long way since introducing Kagi Search and Orion Browser as a public beta in 2022. What started as a quiet rebellion against ad-driven search engines tracking your every click became a complete dedication to building an alternative, humane web.

I signed up for Kagi Ultimate in October of last year after giving the free version a bit of a test drive and liking the results Or more accurately the distinctly customer centered UX vice Google Search. Can’t remember when it happened, but I also switched over to the Orion browser shortly thereafter, mainly to maximize Kagi Search usage but also for no-nonsense ad blocking. It’s also sort of neat that they have a search API.

Unlike many other subscriptions, when the bill for Kagi arrives I have give canceling no consideration. YMMV but I can recommend Kagi. Fingers crossed no rug pulls, third act betrayal reveals, and many more years of success.

Heck, if John Gruber is a paying customer that’s saying something.

P. S. podsearch.david-smith.org is surprisingly hard to find.


podverse.ai

Link parkin’: podverse, from Matt D. Welsh Go Bears!

AI superpowers for your podcast.

Automatic episode transcripts, summaries, AI chat, and more. Take your podcast to the next level.

Welsh also open sourced the project and posted about it.

Podverse is a web app that brings AI superpowers to podcasts. Check out the live site at https://podverse.ai!

This project is meant as a demonstration of a full-stack webapp that leverages AI capabilities.

Of interest relative to the retrocast


Python Package Cookiecutter

Link parkin’: python-package-cookiecutter

There are many cookiecutter templates, but this one is mine and I’m sharing it with you. With it, you can quickly create a full-featured Python package designed to be managed with uv, a default typer command-line interface, optional settings using pydantic-settings and logging using my favorite logger, loguru. Best of all, testing, code quality checks, and publishing to PyPI are all baked in and ready to go.

I’ve been locked on to Simon Willison’s click-app cookiecutter, mainly due to inertia, but this new one hits all my recent developer buttons: uv, loguru, and typer. Also, TIL: Poe the Poet and pydantic-settings


Soco and sonos-lastfm

Link parkin’

sonos-lastfm

This script automatically scrobbles music playing on your Sonos speakers to Last.fm.

SoCo

SoCo (Sonos Controller) is a Python library that allows you to control Sonos speakers programmatically.

I have four Sonos speakers and generally enjoy them. They make streaming audio from my ripped files and from Internet services easy and pleasant. The speakers were acquired a few years ago when Sonos supported last.fm scrobbling (tracing music plays).

Suffice it to say, Sonos has had a quite bad sequence of quarters from the software development perspective. Not much impact on me really. Apple Music playlist scanning has been crappy for a few years now, and since Sonos seems to rely on an outdated way to get access to that data, I’m not holding my breath. But then Sonos support for last.fm integration bit the dust.

So I’m looking forward to trying sonos-lastfm to see if it actually works. Even if not, SoCo looks like fun just for messing around with my speakers from Python. There’s also dreams of doing my own playlist management and getting back to the nirvana of “album shuffle” that was once an iTunes feature.

Here’s the kicker. My last.fm integration still seems to be working for some mysterious reason. I’m sure it’ll go poof the minute I attempt to reconfigure the speakers or tweak the service. And it has been occasionally missing some scrobbles if I don’t have the Sonos desktop app running on a machine somewhere. Whatevs.


humble-cli, My Savior!

I have an unfortunate addiction to Humble Bundle book bundles. It’s a convenient way to get a lot of tech ebook content at a reasonable price. Also, I can get graphic novels and an occasional speculative fiction package of interest, usually in DRM free formats and not through a global monopolist.

That said, for all the time I’ve been buying from Humble Bundle, I’ve only had the web interface to download my purchases. For bulk downloads the UX wasn’t great so I turned to the Humble Bundle Downloader extension for Firefox. Did the trick for quite a bit, but age and rot seem to be setting in. Plus the extension was the only reason to launch Firefox. But the downloader did have the nice feature that it would rename files to match titles listed on the download page, making the results much more human friendly.

Difficulties downloading a recent bundle put me on tilt and I cast about for a better approach.

Enter humble-cli, “The missing CLI for downloading your Humble Bundle purchases”. humble-cli is, you guessed it, a command line interface tool for interacting with your Humble Bundle purchases. It is sweet. Here’s a quick sample of how it lists purchases

crossjam@burningchrome:~$  ~/.cargo/bin/humble-cli list
...
a7nAf3UH3Vru8d3n | Humble Tech Book Bundle: Computer Science the Fun Way by No Starch                                   |    1.42 GiB | -
Ay3VN5rcSKrmt3d4 | Humble Tech Book Bundle: Data Engineering and Management by Pragmatic                                |  462.04 MiB | -
sRs4EtvxGf4hhDhb | Humble Tech Book Bundle: DevOps 2025 by O'Reilly                                                     |  348.66 MiB | -
bXPKrycBCSqbkhE7 | Humble Comics Bundle: Warren Ellis by Image Comics                                                   |    8.44 GiB | -
tDH8P68hkVWFnvNU | Humble Tech Book Bundle: Machine Learning, AI, and Bots by O'Reilly 2025                             |    1.07 GiB | -
nT68upU5Kuhwxp2Y | Brian Michael Bendis' Jinxworld Bundle                                                               |   16.96 GiB | -
crossjam@burningchrome:~$

The tool is a Rust binary that can be easily installed via cargo install. There’s a minor bit of finagling around in browser cookie storage to go through an auth sequence which could be smoothed out, but I’m not really complaining. Quite performant for downloading too while having a similar file naming approach as the Firefox extension.

So far, so good. I’m looking forward to putting humble-cli through further paces and reorganizing my collection.


Another Quick Galaxy Hit

NGC 2283. Just because:

Image Description: A spiral galaxy seen close up and almost face on. It is filled with puffy, patchy clouds of hot gas and dust. Red, orange and yellow colors indicate light emitted by different particles. The brightest colors are in the center and along the two spiral arms, which wind out from the center. Star clusters hide in the gas along the arms. A few large, bright white stars are prominent in the foreground, near to us.


Quick Vibe Coding Thoughts

As Simon Willison points out “Vibe coding is having a moment.”. Here’s a couple of related random thoughts. Just thinking out loud.

Does it make any sense for an open source project to craft issues that are highly amenable to vibe solutions? These could be a means of addressing low stakes work. Alternatively, a really gentle on ramp for new contributors.

Coding for generative art is a really low stakes form of development. Wonder how well vibe coding works for an environment like processing, Cinder, or openFrameworks.


Speaking of OEmbed

Speaking of embedding and Flickr, here’s a quick demo using a galactic image of NGC 602 that I was hipped to by the Space Telescope Advent Calendar

Source: NASA’s James Webb Space Telescope


Rolling Up The Sleeves

Sometimes, you just gotta roll up your sleeves to try and get back in the groove.

Sprinkled throughout this blog was usage within pelican of a Python-Markdown extension pyembed-markdown for plugging in pyembed. Pyembed queries OEmbed endpoints and generates appropriate HTML to, surprise, embed the content from the endpoint.

Unfortunately, the combo had been busted for a bit. The plugin hadn’t kept up with how Python-Markdown registered plugins. The way tagging was done had changed. When there were errors probing the endpoint, pelican wouldn’t generate anyting.

So I spent a leisurely mid-afternoon uncovering the key issues, locating the point for a fix, and committing a small fix within a small personal fork pyembed-markdown. After applying little elisp magic with ripgrep, rg.el, and wgrep-ag, everything updated properly even leaving nice messages about bitrot as appropriate.


TalkPython Deep Dives

Michael Kennedy has added Episode Deep Dives to Talk Python.

Have you ever listened to a podcast episode and later wished there was a quick, detailed way to revisit the best points—without scrubbing through the entire recording again? That’s exactly why we launched Episode Deep Dives at Talk Python. This feature provides a rich, structured walkthrough of each podcast episode, giving you a snapshot of all the essential details and takeaways in one convenient place. Whether you’re curious about a specific tool our guest mentioned or you want to recall the main ideas days after listening, Episode Deep Dives makes it easy to dive right back into the heart of the conversation.

The quick look is really good. Kennedy doesn’t go into depth on how they’re created (“Each one of these deep dives does take some time and work to generate.”), but I suspect there’s some LLM machinations in there, supplementing a bit of manual human labor. Even so, dives already exist for over a year’s worth of episodes.

I also listened to the embedded “conversation”. My curiosity was piqued as to the participants whose identities weren’t provided. It was pretty entertaining and provocative for about 10 minutes and then it got repetitive. Also, none of the speakers actually mentioned names and at a few points I was confused whether there were two or three distinct voices. I’m pretty sure it’s an AI generated audio overview, ala NotebookLM. Whatever it was, a couple of interesting notions around automatic content generation and community building. Needed some editing.

Looking forward to taking advantage of these deep dives, a behind the scenes Talk Python episode, and the proliferation of the feature to other podcasts.


What. Is. Happening?

It’s the first day of 2025. If there’s anyone paying attention, this blog has come back to life once again. Why?

Previously I hinted at some personal challenges that were seemingly surmounted, freeing up emotional and cognitive energy. That was a false dawn. Lots of other issues cropped up in work and life. Nothing cataclysmic but definitely draining. No need to detail them here. Some are resolved and some are ongoing.

I need an outlet though to provide some release. I’m also aiming to generate some supplemental material for the overall technical portfolio. Why?

The answer, as to everything in technology for 2024, is LLMs. In my day job, I work on MLOps for data intensive workloads in the medical informatics space. (I’m easy to find on LinkedIn if you need details.) Being in tech long term, I’m also just generally interested.

I’m of the “promise and peril” persuasion in relation to LLMs. Skeptical of much of the perilous hype but hopeful regarding some of the technology promise. Simon Willison has a great recap of lessons learned about LLMs in 2024 and possibly says it best (apologies for the heavy quoting):

A drum I’ve been banging for a while is that LLMs are power-user tools—they’re chainsaws disguised as kitchen knives. They look deceptively simple to use—how hard can it be to type messages to a chatbot?—but in reality you need a huge depth of both understanding and experience to make the most of them and avoid their many pitfalls.

If anything, this problem got worse in 2024.

We’ve built computer systems you can talk to in human language, that will answer your questions and usually get them right! … depending on the question, and how you ask it, and whether it’s accurately reflected in the undocumented and secret training set.

There’s a flipside to this too: a lot of better informed people have sworn off LLMs entirely because they can’t see how anyone could benefit from a tool with so many flaws. The key skill in getting the most out of LLMs is learning to work with tech that is both inherently unreliable and incredibly powerful at the same time. This is a decidedly non-obvious skill to acquire!

There is so much space for helpful education content here, but we need to do do a lot better than outsourcing it all to AI grifters with bombastic Twitter threads.

I think telling people that this whole field is environmentally catastrophic plagiarism machines that constantly make things up is doing those people a disservice, no matter how much truth that represents. There is genuine value to be had here, but getting to that value is unintuitive and needs guidance.

Those of us who understand this stuff have a duty to help everyone else figure it out.

So I’m back to, among other things, dig into this LLM stuff with purpose and intent.


le Carré, Smiley, and The Circus

Instead of the common comprehensive look at the past year, I’m going to close with focused observation on one specific thing I did this year.

Read a lot of spy novels.

read more ...


HuggingChat

Link parkin’: HuggingChat UI

A chat interface using open source models, eg OpenAssistant or Llama. It is a SvelteKit app and it powers the HuggingChat app on hf.co/chat.

From the docs:

Open source chat interface with support for tools, web search, multimodal and many API providers. The app uses MongoDB and SvelteKit behind the scenes. Try the live version of the app called HuggingChat on hf.co/chat or setup your own instance.

via Jose Antonio Lanz at Decrypt

If you haven’t noticed a theme yet, I’ve become interested in frontend user interfaces for LLM applications.


LibreChat

Link parkin’: LibreChat

LibreChat is the ultimate open-source app for all your AI conversations, fully customizable and compatible with any AI provider — all in one sleek interface

All the gloss on the home page is a bit much for me, but the real proof is in grabbing the source code and test driving.


Acorn 8

Announced a few days back by Gus Mueller:

Acorn 8 has been released!

This is a major update of Acorn, and is currently on a time-limited sale for $19.99. It’s still a one time purchase to use as long as you’d like, and as usual, the full release notes are available. I want to highlight some of my favorite things below.

I don’t do much image editing, but when I do, I use Acorn. Do Gus a solid and buy a copy.


OpenwebUI

Link parkin’: Open WebUI, a well done, open source, frontend to LLM APIs. From the Open WebUI docs:

Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI interface designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs.

Interestingly the project has an academic looking paper on building a platform for LLM evaluation and auditing attached.

Via a positive overview from Simon Willison


Podcast Aggregation Self-Hosting

Link parkin’: a few self-hostable podcast frameworks:

PinePods

PinePods is a Rust based app that can sync podcasts for individual accounts that relies on a central database with a web frontend and apps available on multiple platforms

Podgrab

Podgrab is a is a self-hosted podcast manager which automatically downloads latest podcast episodes. It is a light-weight application built using GO.

PodFetch

PodFetch is a simple, lightweight, fast, and efficient podcast downloader for hosting your own podcasts.


DBOS Stateful Workflows

DBOS (the company) has come across my radar via two vectors. Ben Lorica had a newsletter update and podcast interview with the founders. Meanwhile the open source DBOS framework was mentioned on a recent episode of Python Bytes.

This could be a useful platform for my purposes, although I’m a bit hesitant since it’s new and not “boring”. However, it builds on top of PostgreSQL and is the result of a well grounded research effort out of MIT and Stanford.

The interview was a bit “salesy” but not overbearing. Also, Lorica tended to steer the discussion in the direction of AI and agentic use cases.

From the documentation introduction

DBOS is a serverless platform for building reliable backend applications. Add lightweight annotations to your app to durably execute it, making it resilient to any failure. Then, deploy your app to the cloud with a single command.

As a data engineer, the framework is probably worth a test drive to compare against Prefect and/or build your own systems.


uv scripts

Trey Hunner, a seasoned Python trainer and developer, has been experimenting with uv for a number of tasks including short script development while also replacing pipx and pyenv, two tools I routinely use. He hasn’t completely embraced a substitution but uv does provide an intriguing upside:

The biggest win that I’ve experienced from uv so far is the ability to run an executable script and have any necessary dependencies install automagically.

This doesn’t mean that I never make Python package out of my Python scripts anymore… but I do so much more rarely. I used to create a Python package out of a script as soon as it required third-party dependencies. Now my “do I really need to turn this into a proper package” bar is set much higher.

This relies on a Python packaging standard extension mechanism for embedding metadata in a script.

Pushing this to the limit, would it be possible to declare xonsh as a dependency and then write the rest of the script in that shell language?

Via PythonBytes episode 415


LaunchBar Actions Editor

Did a little more thinking about LaunchBar Actions and drifted into considering how they could combine with Simon Willison’s llm framework. Simon mostly demonstrates llm as a powerful CLI tool for interacting with modern AI models. The CLI is porcelain around extremely sophisticated Python module plumbing. LaunchBar could make a nice alternative porcelain.

So how does one go about creating LaunchBar Actions? The code examples I’ve seen are pretty straightforward. I could likely reverse engineer the directory structure and code interface even if documentation didn’t exist. However, the fine folks at Objective Development added a LaunchBar Action Editor in version 6.4 of the tool. Well what do you know!? 😲

And of course there is documentation for authoring actions.


MarkItDown

Link parkin’: MarkItDown

MarkItDown is a utility for converting various files to Markdown (e.g., for indexing, text analysis, etc). It supports:

  • PDF
  • PowerPoint
  • Word
  • Excel
  • Images (EXIF metadata and OCR)
  • Audio (EXIF metadata and speech transcription)
  • HTML
  • Text-based formats (CSV, JSON, XML)
  • ZIP files (iterates over contents)

This should definitely come in handy.

Via Daring Fireball


Fastmail Hardware

Rob Mueller CTO, blogged a piece that caught my eye about how Fastmail, a long standing hosted email service, builds on their own hardware as opposed to public cloud infrastucture. Part of the reason is that Fastmail predates the availability of the major public cloud providers. Even at that they’ve never seen a reason to move despite the persistent cloud evangelism that has swept the tech industry.

A big part is the storage pricing of the blob services, S3 and its competitive spinoffs. This interests me because I have a pile of Discogs data that I’ve been hoarding. Pushing it all into the cloud feels appealing but I don’t want to break the bank.

Mueller breaks down the surprising costs of storage at Fastmail scale. Here’s the money graf on the choice:

It’s interesting seeing the spread of prices here. Some also have a bunch of weird edge cases as well. e.g. “The S3 Glacier Flexible Retrieval and S3 Glacier Deep Archive storage classes require an additional 32 KB of data per object”. Given the large retrieval time and extra overhead per-object, you’d probably want to store small incremental backups in regular S3, then when you’ve gathered enough, build a biggish object to push down to Glacier. This adds implementation complexity.

Ultimately though, if you know your use case intimately, have a decent handle on the cost envelope, and can employ enough operational expertise, doing it yourself is an advantage.

Bonus, TIL about Wasabi.


Launchbar Actions for Kagi

A few months back, I ponied up for a high end subscription to the Kagi search engine. Meanwhile, the LaunchBar app is embedded in my fingertips. I was poking around for ways to plug Kagi into LaunchBar (using Kagi search ’natch). Found two decent looking plugins:

These look simple enough that I should spend some time to learn how to write LaunchBar actions myself. If there isn’t an analogous Kagi action in Python there should be.


Embeddings with llm-gguf

I find the ability to create multi-dimensional embedding vectors from deep learning models quite fascinating. There’s an obvious application pattern in Retrieval Augmented Generation (RAG) with current LLMs. However, useful embedding models come in a much wider range of scales and capabilities then general language models. In principle, it’s quite possible to train custom embedding models at a reasonable cost in terms of compute hardware, data scale, and time.

Last month, Simon Willison updated his llm-gguf plugin to support creating embeddings from GGUF models specifically for embeddings.

The LLM docs have extensive coverage of things you can then do with this model, like embedding every row in a CSV file / file in a directory / record in a SQLite database table and running similarity and semantic search against them.

This could come in handy since I have a few piles of content laying around where using embeddings to supplement search and retrieval would be an interesting experiment.


I Stand Corrected

Previously, I thought The Atlantic’s “Space Telescope Advent Calendar” only collected images from 2024. After my daily source investigation of today’s selection, leading to this discovery on Flickr uploaded in 2017, turns out anything is fair game.

We regret the error.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.