birkin/code4lib_2026_highlights.md

## code4lib_2026_highlights.md

      
    Raw
  

              code4lib_2026_highlights.md
            
          
    code4lib-2026 highlights

Notes:

not all talks are listed -- this is just my highlights
below contains my raw quick-typed notes, with a Summary of what resonated for me


Day 1 — March 2, 2026

On Searching for Library Standards that Align with Library Values


Eric Phetteplace
Systems Librarian, California College of the Arts

Notes:

IFLA's statement
Eric: hard to justify the work required for cataloging.
List of alternatives.
Promising vocabularies:

Homosaurus
Getty (thesaurus of geographical names, among others)
Wikidata -- not really intended for library-authority-control. Need to review the terms that seem overlapping.


Summary:
- diversify the vocabularies we use as a hedge against political-influence on terms.
Artificial Intelligence Ethics Regulation in the United States


Wilhelmina Randtke
Head of Libraries Technologies and Systems, Georgia Southern University Libraries

Notes:

DoD & corporations biggest players
reminder that ethics is a big part of the 2021-2025 AI act
instead of govt having lots of rules -- rules required things like companies to have policies.
some procedural design-trees were covered in autonomous contexts.
there was a lot of industry-activity on the part of corporations to stave off govt regulation in favor of self-regulation/polies.
2023 -- executive order fostering anti-bias work.
2025 -- multiple ExOrders -- removal DEI anti-biasing as being biased. -- also look for ways to penalize states for slowing AI work -- regulating electricity-generation for data-centers and chips.
currenlty: mostly preventing states from regulaating -- may not be actionable -- but precedent, like federal drinking-age.
haven't been privacy-related federal rules.
Uniform Law Commission -- their frameworks are often adopted by states -- eg child-support rules.
ULC -- bad framework for privacy -- hasn't been adopted.

Summary:

great history.
interesting policy perspective in early days of want to have regulation but not in a stifling way.
interesting commentary on federal law trumping state law, eg child support.
interesting references to Uniform Law Commission frameworks.

Replacing Legacy Sites with Low Maintenance Statically Hosted Web Archive-powered Mirrors (remote)


Ilya Kreymer
(affiliation not listed)

Notes:

review this talk!
webrecorder
browsertrix
archiveweb
replaywebpage
webarchive powered merror

WACZ

main benefit -- can access one html page without downloading the entire WACZ file
magic via "service-workers" a standard web technology that allows inercepting any request on a domain and returning a dynamic response in the browser. (me: learn more about this)

cool

eg loaded from drupal -- mirrored site: static-site

my followup notes:

WACZ = Web Archive Collection Zipped (an LOC specification)
A WARC file is the core archival container: it stores the captured web resources themselves as a long sequence of records. It is the standard archival format for harvested web content.
A WACZ file is a ZIP-based package around WARC data, plus indexes and metadata for replay/distribution. The important practical feature is that, because it is packaged with indexes and structured as a ZIP, replay tools can often fetch only the portions needed to render a specific archived page, instead of downloading the entire archive first. The spec explicitly describes WACZ as optimized for random access to packaged WARC data, and the Library of Congress description says clients can read portions on demand using HTTP Range requests.

Summary:

great thought-inspiring talk
worth a review by ITS/DT team
my thinking: most useful if facets expose all of site
for me: outside of initial WARC work scope -- but think more about:

how to build WACZ files from the downloaded WARCs
what additional metadata would be needed to build the manifests (indexes and metadata for replay), than what I'm downloading (or can that metadata be derived from inspecting the downloaded WARCs? or those long filenames?)


Tales from the Online Archives of California Replatforming


Chad Nelson and Amy something?
California Digital Library

Notes:


OAC based on XTF xml platform that was old and fragile


70K finding aids from 350+ archival institutions in CA (EAD, MARC, even some documents)


impetus -- dying operating system


no config-management; no tests


heavily used and deeply linked by other institutions


ArcLight -- based on blacklight


(traject)


django admin -- airflow (data and workflow) -- arclight


s3 rds for data-storage


textract, sos, MWAS in Airflow


cloudfront for arclight/solr


they build the django admin first


"infrastructure as code"


arclight tricky for non-blacklight users


one-writer, multiple-readers for solr -- more complicated than expected


Release

really rough
solr up and down / arclight kept crashing
solr-queries were super-expensive
kept blocking bots, but hard
took a week to get stable
baseline automated deployment and testing really helped pushing out experiments for improvements

django and airflow -- opinionated and reliable and flexible


airflow moves toward application instead of framework


django just framework


arclight full tool


amy's takeaway -- much easier to modify flexible tool than customize complete tool.


arclight feels most risky due to lack of rails knowledge and only use of solr.


Summary:

we heard about Apache Airflow at least three times from this conference. Our team should investigate it and file it away as something to know about for future project-discussions.
replacement used django-admin -- they were very happy with that.
"infrastructure as code" -- mentioned: worth thinking about.
solr queries were expensive (me: caching? -- especially if it can be auto-updated only on new ingestion?)
amy's takeaway --> much easier to modify flexible tool than customize complete tool

Digital Preservation From Scratch (remote)


Natasha Fisher
Digital Archivist, The ArQuives

Summary: i'm not replicating all notes -- my general sense: that this and other digital-preservation talks are worth reviewing by ITS team to build osmosed knowledge re a repository-adjacent field.
A workflow for automating content detection in dissertation PDFs


Peter Cerda
(affiliation not listed)
Aurelia Hudak
(affiliation not listed)
Rachel Woodbrook
(affiliation not listed)
(everybody from University of Michigan Library)

Notes:

UofM acquired digitized materials from ProQuest
Historical requests: reach out to author if possible -- investigate copyright status -- if no copyright notice, open access as if dissertain is in public domain.
lots of legwork -- is there a way to automate that workflow?
flow:

extract first and last 15-pages of pdf.
contains searchable text? scans for copyright-related phrases.


uses fuzzy0matching -- read other two of that slide

results:


great!


ran it on 13,000


made over 8,000 open-access


identified 1,500 for further review


code available at deep-blue-data (doi)


decreased number of access requests.


Summary:

cool talk worth ITS-folk reviewing for ideas about how to automate copyright-detection workflows.
worth sharing gist with Andrew Creamer.

Librarian as institutional metadata steward: Case studies in Research Information Management Systems administration


Clarke Iakovakis
Scholarly Services Librarian, Oklahoma State University

Notes:

Symplectic Elements -- vendor system to track research across univertiy.
Library stepped forward and said -- hey, this dovetails with our interest in open-access and bibliographic-metadata.
1200+ public profiles averageing 20,000 clicks/month
internal web-interface (vendor?)
externally-facing web-directory
xml-based api
SSRS for reports

lots of data that does not come in with feeds: appointments, degrees, language-spoken.
custom jobs

science fair judges
datacite extracts (doesn't feed to system directly)

Appointments: no single-source.
Scopus and ORCID-ids
OCLC -- social interoperability in research support article.
Summary:

a line I liked (paraphrased): "Our Library stepped forward and said -- hey, this dovetails with our interest in open-access and bibliographic-metadata." Point: my perspective: if we want to replace VIVO for purposes of sustainability/features -- ok, reasonable. But I'd like this framing as a counter to the perspective that maintaining a faculty-list has nothing to do with our mission.
worth a review if we evaluate vendor products

BookReconciler: An Open-Source Tool for Metadata Enrichment and Work-Level Clustering (remote)


Melanie Walsh
(U.Washington)
Matt Miller
(LOC)
Dan something from Emory

GREAT PRESENTATION -- review this and share it with circ-folk.
Notes:

start with inimal info like Title -- and then add something like ISBN -- and tool will grab more info.
Post45 Data Collectinve -- Literary and Cultural Datasests -- Melanie's the director or something. -- around for 5 years.
have 5-different "Book of Salt" entries from different datasets. Challenges of reconciling them.
goal -- don't require lots of technical expertise to improve reconciliation.
built reconciliation tool on top of OpenRefine.
one-click install-app -- lauches a little internal server that can connect to open-refine.
loc, worldcat, openlibrary, wikidata, viaf, hathitrust, google books. (gbooks best -- checkout their eval-paper) -- connects via api -- except for hathitrust, for which they grab the full data and do lookups from an sqlite db.
bit.ly/BookReconciler
looking for others to use it and contribute to it.

Summary:

don't remember the details of why I wrote down the great-presentation comment. I suspect it's related to my sense of how DT has skills that could be useful for circ-folk, either to do some things for/with them -- or to convey knowledge/tools we have that might be really useful if they're not already familiar with them. (example: i'm thinking of Justin and I think others on our team having familiarity with OpenRefine.)

New (and Old-Fangled) HTML Elements: What is Accessible Really?


Katherine “Kate” Deibel
Systems Librarian, PCOM

Notes:


ARIA: accessibility R interactive applications


accessible semantic elements:

attributes and properties you apply to it
scripts for that helement
etc


 -- firefox and safario on iOS announce group role, but others don't


mailto and ftp have good screenreader support -- SMS and iphone links and something else don't (JAWS)


newer html5 elements:

<dialog> element -- use that if you need a dialog, instead of bootstrap modal, though that's better than others.
<section> element, <summary>, <details> -- stuff's complicated
<ul role="list"> (good)
screenreaders don't handle <dt> and <dd> very well -- but kate says they can be semantically useful.


Summary:

context: Kate's a long-time avid champion of accessibility.
basically, IIRC: there are debates about whether these semantic elements help or hurt accessibility, and that a bit too-much emphasis has been solely on screen-reader support for some assessments.
this line from my notes: "accessibility is not just about screen-reader-support -- so more broadly, accessibility should be about being able to be acted upon by "assistive technologies"."

Scary Stories to Tell in the Dark (Mode)


Robin Davis
Associate Head, User Experience, NC State University Libraries
Meredith Wynn
(affiliation not listed)

NC State usability widget.
Good talk -- worth a review.
firefox dev-tools offers click-switch without changing os setting.
Summary:

my main takeaway is separate from the actual interesting tool: NC State has a long history of incorporating -- into their project-planning -- ASSESSMENT.

Day 2 – March 3rd

Cybersecurity Preparedness for Libraries: A 2026 Action Plan (remote)


Blake Carver

Sys admin, Lyrasis


Notes:

passkey -- little cryptographic key stored in your password manager that replaces the password
"admin-rights are a vulnerability -- not an enabler"
principle of least-privilege
HECVAT -- higher-education...C...vendor-assessment-toolkit
backup, training, hardening, planning

Summary:

Joe would like this -- though he likely already knows this stuff.
though i've heard the concept before, I liked the framing: "admin-rights are a vulnerability -- not an enabler"

Old Maps for New Apps: Making and Using Georeferenced Sanborn Maps at Scale


Adam Cox

Geospatial software developer, "Legion GIS?"


Notes:

django stack -- using postgres -- their code is on github
me: what are the licensing issues? https://library.brown.edu/create/libnews/new-library-subscription-to-sanborn-maps-for-rhode-island/ implies ours are not publicly accessible -- I'm sure LOC's are.
suggesting "a georeferencing commons"
principles

georeferencing as data creation
iterative and collaborative work
independent stages
embrace complex structures in the source material


prepare --> georeference --> trim

define regions
ground control points
polygon masks

GREAT full-processing overview.
georeference-a-thons
Summary:

i just love maps
their stack: django + postgres. Me: postgres has long had great db fields for geospatial data -- has mysql implemented these? if so, in a standard or "add-on" way"? if "add-on" -- is this worth requesting that OIT do this for our dbs so we have the ability to store geospatial data for future projects?
worth a review to understand the process of gathering together necessary map metadata to do cool things.
makes me think we should definitely consult with frank if we have a bdr-ingestion-process for maps -- to determine metadata we might be able to ingest to potentially offer cool viewers.

Decoding the Past: Exploring AI-Based Handwritten Text Recognition in Digital Collections (remote)


Aaron Pahl

University of Alabama / Birmingham


Notes:

started with logbooks to extract data into more usable format (ssheet?)
tried 4 free-tools and 3 paid-tools
evals: accuracy; table-structure-preserved; use for other projects
ggl-cloud-visiion --> abbyy --> tesseract
testing process of suite of tools made somewhat obsolete by improvements in AI tools.

Summary:

interesting talk, but for me the most interesting thing was that last fourth of the talk, that improvements to ai-models have been so dramatic that it may well have made obsolete the testing process the presenter had been using.
me: this is a repeated theme I've seen over the last few months, with coding and in other realms: people re-experimenting with existing ai-tools, in standard ways, are often very surprised at improved results, and folk experimenting in new ways (ie having a tool make a plan, manually updating a plan -- and then having the tool implement the updated plan), are often shocked by how good the results are.

Spec before Tech: Delivering digital objects using archival principles with DadoCM


Gregory Wiedeman

University Archivist, University at Albany, SUNY


Maureen Cresci Callahan

Director of Archives and Special Collections, University of Connecticut


GREAT TALK -- ask C&N to watch this.
Notes:


All archives have a DAM problem   :)    (digital-asset-management)


the DAMs info doesn't give you context that a finding-aid does


an epistemological problem


same info in three different collections

letters from a congress-person
records of a local communist party (same info, different context)
personal papers of a judge who prosecuted communists


purpose is evidence -- not just information


archival data has a graph structure

think of linked-data without URIs


archivists guide you to material in meaningful chunks -- context included


metadata already has a schema -- many times details just not filled out yet


inheritance often used in archival data -- "collection note" -- can be overriden


DadoCM -- IML grant -- goal: to manage archival description and digital objects without data duplication

mapping
portland-common-data-model --fileset/file...version?


the DAM-image is incorporated into the archival-interface


Summary:

note my "GREAT TALK -- ask C&N to watch this."
IIRC, my thinking: broadening all of our knowledge of EADs. eg, I know a lot about the concept of EADs, but had never heard that framing that DAMs focus on the specific digital object, while EADs focus on the context of the digital object. That's so useful.
above is conceptual -- i'd like to review their talk re the specific ways they worked with their EADs before we embark on significant RIAMCO work.

Unveiling Boston Public Library’s Hidden Collections Using the WorldCat Metadata API


Jay L. Colbert

Special Projects Cataloger, Boston Public Library


Mike Williams

Me -- another good talk in terms of widening perspective
Notes:

Snapicat
inspired by BookOps easy-to-adapy Python interface for the WorldCat Metadata API
really good Inspired slide
i think ended up on a react stack
worldcat-api documentation lacking in some ways -- eg which bibliographic... something version

Me: what's this for? if for searching -- why not the regular ILS?  Or is it for cataloging these into OCLC for the first time?  Maybe that's why they added minimal metadata, maybe to do lookups. Oh, maybe it's using the worldcat api to do lookups to get additional metadata to catalog it into the ILS.
ISBNs without OCNs (oclc control numbers)
Summary:

I missed part of this talk, I think I was trying to answer some email.
The reason I'd like to review it is in case it's part of that idea I've had about how ITS/DT might be able to contribute to other departments (besides hay) -- if this is a tool/process to enhance catalogers' work.

From Beta to RC: The Marriott Reparative Metadata Assessment Tool (MaRMAT)


Aiden de Boer

Student Programmer, J. Willard Marriott Library, University of Utah


Rachel Jane Wittmann

Interim Head of Digital Library Services, Metadata Strategies Librarian, J. Willard Marriott Library, University of Utah


Kaylee P. Alexander

Research Data Librarian, J. Willard Marriott Library, University of Utah


Notes:

python-based open-source tool to run on your data to see what data can be addressed for reparation.
reparative metadata, LCSH, and Sensitive Content -- Lexicons
initially launched alongside the inclusive metadata tool in 2024

Summary:

just a cool project: worth seeing if we could run it on extracted BDR metadata -- and worth letting cataloging know about.

Lightning Talks:

Interoperable Ebook Standards — Rob Cartalano

libraryfutures.net/interoperable-ebook-standards-statement
OPDS-feed -- read more about that -- could be an interesting format to be able to export to.
EDRLab Thorium desktop-reading-platform
ME: really worth a look.
Summary:

I hadn't even known there was such a need for ebook-spec standardization.
I've long thought we should investigate ways to archive ebooks instead of pdfs where possible -- and have librarians work with upstream folk, like grad-school, to encourage this.

Lessons-Learned Building LLM-Powered Apps — Justin Uhr

Summary: OMG, WHAT A TALK!!! 😁
Copyright/AI-talk -- person?

- "sufficient human involvement"
- prompts alone do not  provide sufficient human control
- if ai reproduces exact output -- yes, infringement
- Meta case: training considered fair-use
- Anthropic case: training considered fair-use
- OpenAI case: underway

Summary:

good quick overview.

Wait?! We're Out of ORCiD API Keys? — Richard Higgins

- why are multiple ORCID member-IDs/accounts/whatever needed for the different library uses?
- shib "Comanage-Registry" can handle ORCID member-API support

Summary:

didn't fully understand the need he described for multiple accounts -- but worth reviewing / following up on -- if we were to enhance a get-ORCID service.

Happy devs and happy patrons: the wonders of "uv".


Birkin James Diana

Summary: OMG, WHAT A TALK!!! 😁
Supporting the Hidden Work: OSS Projects for Inventory and Weeding


Maccabee Levine


Senior Library Application Developer, Lehigh University


if you're working with google-app-scripts -- try clasp -- helps with syncing local code to google-app-scripts


Summary:

don't remember specifics of talk, but Maccabee's a good thinker, so i may review -- but do remember this takeaway: if you're working with google-app-scripts -- try clasp -- helps with syncing local code to google-app-scripts

Offering On-Demand Virtual Machines to Library Users


Seth Erickson

Notes:

students can use it for a quarter -- virtual machines run inside of k8.
"Coder" -- open-source, self-hosted for providing cloud-based development environments.
they have a template for the form to create a VM -- changes each quarter.
uses terraform internally for configuration.
they set it up so you can click something which will open your local vscode and install a connector.
users have root access into their machines.
browser/vs-code, or coder-CLI
"establishes a "wireguard" connection? look that up.
he's ocnnected it to other backends, like an NSF something-or-other using "open-stack".

Summary:

very cool talk -- worth sharing with CDS -- not in lieu of their reclaim (which is specifically for persistent storage), but as an alternative for short-term needs.

Modularity: doing it all isn’t a good thing


Kyle Reiley
Lead Software Developer/Analyst, The University of Texas at Austin Libraries

Notes:


classic "build it for them" site.


last x percent of customization results in lots of tech-debt.


shifted to fully custom.


solved the platform problem but borrowed new one.


shared infrastructure approach. took care of backend.


"portal" for frontend -- still in-process. sounds somewhat complex.


strapi -- when they improve search, every-site-imprves -- same for accessibility.


LADI


one of best benefits is no longer seeing old sites decay.


Summary:

gist felt similar to our goal with the BDR-uploader-hub webapp: how can we offer multiple-separate-webapps, in ways that minimize lack-of-maintenance-decay -- and promote improvements auto-applying to all apps?

Spreadsheet to Service: Building a Zero-Cost Search Interface (The LACLI Story)


Eric Silberberg
Librarian, Queens College, City University of New York

Notes:

LACLI -- free online index of latin-american stuff/collections/resources.

Summary:

didn't take good notes -- but I think the gist was that folk can update a database -- and they have an automated system to produce a static site from it. That concept -- of programmatically producing static-sites from complex systems -- is, I think, worth thinking about -- and worth brainstorming which of our services we could do that with if we wanted to.

Day 3 – March 4th

Dr. J. Nathan Matias Closing Keynote


Dr. J. Nathan Matias

Assistant professor, Cornell University Department of Communication
Founder, Citizens and Technology Lab


Notes:


Agnotology -- coding and cultivating knowledge in the context of...


mayan-language book, in spain


1500s priest interested in learning about the mayan language -- also opposite, organized burning of mayan books


mayan book saved because it had been taken to spain


annotology -- the study of ignorance


"not about blocking knowledge, but rather about establishing a metaknowledge regime that could be used to attack the whole idea..." (eg, of science)


citizens and technology lab informed by citizen-science movement


cultivating knowledge through human cooperation -- wikipedia visualation.


AI moderation-systems introduced in 2007-8 -- ai removes "low-quality" edits -- a decline started then

me: how do we know that's the cause of the decline, instead of correlation?


reciprocity (benefits) --> indirect-reciprocity (thanks) --> upstream-reciprocity (pay-it-forward)

his team was interested in the latter
how to measure the effect of the wikipedia "thanks" link


the "door" -- like libraries and the internet, wikipedia's views are declining due to generative-ai

problem: don't experience invitations to contribute to the body of knowledge
wikimedia's "futures lab" site exploring this, their mission, to protect and grow this knowledge repository


jefferson: fierce advocate for intellectual freedom


reporting, reviewing, and responding o online harrassment -- WAM! org (women and media)


rscience -- can norms against harrassment change behavior and grow participation online?

collect data  -- coordinate intervention -- manage ethics
posting the rules increased the chance of rule-comopliance by > 8pct points, and increased participation by 70%

that study has been replicated 6 times by different communities


sicence for understanding, de-escalating, and something-else -- musk and meta have threatened researchers investigating online-hate.


Agnotology -- powerful folk will muddy waters, seek to control research, threaten researchers

coalition for indepenent technology research
CITR files amicus brief in x lawsuit t defend the rights of independent researchers

fed judge dismissed x's lawsuit -- credited CITR


have looked into automated-copyright-enforcement -- interesting that I can't guess whether they're for or against that. I think against, cuz they don't want communication chilled?


how to balance collecting data needed to protect the inegrity of people's thoughts and speech?


one of the risks of AI is that it takes people out of the equation/info-flow

meghan partner -- ex-director of human rights data analysis group -- HRDAG


evaluation of ai-systems:

reliability
transparency
models of public involvement in ai research -- co-creators, participatory-models, others


5  areeas where science of ai evals:

equipoise -- ??
measurement needs to be accurate
explanation -- example?
inference -- algorithm "fair", but application not
interpretation


people of both parties do trust universities -- higher-ed/july 30,000 poll


Summary:

cool talk -- i need to review... this person's lab/center does cool work/research -- one interesting point: how to create ai-enhanced workflows that augment humans in the loop, instead of replacing them?

Lightning Talks 3 (10:00AM–10:45AM ET)

(not all listed -- just highlighting some)
Corey Halpin: A Story About Keyboards


cool alternative to keyboards -- and tools to learn!
also chords -- and tool examples

Summary: just fascinating.
Cynthia Schwarz, MIT Libraries: Why You Need Governance


being the connection between people who actually use the systems
needed: functional users, tech person, project-manager
provide sounding board for new projects and ideas -- prioritize (ideally via agreed-upon rubric) -- flesh out work and implement

Summary: don't remember specifics, but I thought it was worth reviewing.
Kayla Camacho, NAMI national alliance on mental illness: Accessible Info for Users in Distress


emotions impact usability -- frustrations and dead-ends can really degrade confidence in tools
gave categories of things to think about
what's balance between conveying richness of data, without overwhelming users?
folk looking for simplification-offramp may be more susceptible to predatory influences
ideal goal not just to share, but to build relationship

Summary: I italicized the highlights above.
Andromeda Yelton: Your Magical Boolean Box


boolean satisfiability problem
"sat-solvers"? "stat-solver"? -- z3 from microsoft -- ah, "satisfy"
nice example slide
AtMost(), AtLeast(), Implies() -- variables can be numbers, not just bools

Summary: I liked this because it surfaced concepts I hadn't thought of.
Matt Sherman, Drexel: More Product Less Process in Digital Collections?

Summary: this is the 2nd time I've heard the viewpoint that archivists deal with boxes-folders, whereas digital repos deal with "items"
Stephano: Pocket Archive, how-small-can-you-go?


areas of conflict with real problems like limited funding and freedom-of-movement
near-zero budget, can work w/o internet connection, can incorporate into bigger systems (harvest site)
rasperry-pi -- $150.
makes static-site
very cool

Summary: was this the one I thought Connor would find interesting?
No results found