home ¦ Archives ¦ Atom ¦ RSS

NuExtract 1.5

Link parkin’: NuExtract 1.5

We introduce NuExtract 1.5, the new version of our foundation model for structured extraction. NuExtract 1.5 is multilingual, can handle arbitrarily long documents, and outperforms GPT-4o in English while being 500 times smaller. As usual, we release it under MIT license.

Before diving into the details of what’s new, let’s discuss what this is all about. NuExtract is a family of small open-source models that do only one thing: they extract information from documents and return a structured output (JSON). It turns out that, because they only do this one thing, they are very good at it.

Models built by NuMind, NuMind on Hugging Face

Via Simon Willison

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.