Notes:
- not all talks are listed -- this is just my highlights
- below contains my raw quick-typed notes, with a
Summaryof what resonated for me
- Eric Phetteplace
- Systems Librarian, California College of the Arts
Notes:
- IFLA's statement
- Eric: hard to justify the work required for cataloging.
- List of alternatives.
- Promising vocabularies:
- Homosaurus
- Getty (thesaurus of geographical names, among others)
- Wikidata -- not really intended for library-authority-control. Need to review the terms that seem overlapping.
Summary: - diversify the vocabularies we use as a hedge against political-influence on terms.
- Wilhelmina Randtke
- Head of Libraries Technologies and Systems, Georgia Southern University Libraries
Notes:
- DoD & corporations biggest players
- reminder that ethics is a big part of the 2021-2025 AI act
- instead of govt having lots of rules -- rules required things like companies to have policies.
- some procedural design-trees were covered in autonomous contexts.
- there was a lot of industry-activity on the part of corporations to stave off govt regulation in favor of self-regulation/polies.
- 2023 -- executive order fostering anti-bias work.
- 2025 -- multiple ExOrders -- removal DEI anti-biasing as being biased. -- also look for ways to penalize states for slowing AI work -- regulating electricity-generation for data-centers and chips.
- currenlty: mostly preventing states from regulaating -- may not be actionable -- but precedent, like federal drinking-age.
- haven't been privacy-related federal rules.
- Uniform Law Commission -- their frameworks are often adopted by states -- eg child-support rules.
- ULC -- bad framework for privacy -- hasn't been adopted.
Summary:
- great history.
- interesting policy perspective in early days of want to have regulation but not in a stifling way.
- interesting commentary on federal law trumping state law, eg child support.
- interesting references to Uniform Law Commission frameworks.
- Ilya Kreymer
- (affiliation not listed)
Notes:
- review this talk!
- webrecorder
- browsertrix
- archiveweb
- replaywebpage
- webarchive powered merror
WACZ
- main benefit -- can access one html page without downloading the entire WACZ file
- magic via "service-workers" a standard web technology that allows inercepting any request on a domain and returning a dynamic response in the browser. (me: learn more about this)
cool
- eg loaded from drupal -- mirrored site: static-site
my followup notes:
- WACZ = Web Archive Collection Zipped (an LOC specification)
- A WARC file is the core archival container: it stores the captured web resources themselves as a long sequence of records. It is the standard archival format for harvested web content.
- A WACZ file is a ZIP-based package around WARC data, plus indexes and metadata for replay/distribution. The important practical feature is that, because it is packaged with indexes and structured as a ZIP, replay tools can often fetch only the portions needed to render a specific archived page, instead of downloading the entire archive first. The spec explicitly describes WACZ as optimized for random access to packaged WARC data, and the Library of Congress description says clients can read portions on demand using HTTP Range requests.
Summary:
- great thought-inspiring talk
- worth a review by ITS/DT team
- my thinking: most useful if facets expose all of site
- for me: outside of initial WARC work scope -- but think more about:
- how to build WACZ files from the downloaded WARCs
- what additional metadata would be needed to build the manifests (indexes and metadata for replay), than what I'm downloading (or can that metadata be derived from inspecting the downloaded WARCs? or those long filenames?)
- Chad Nelson and Amy something?
- California Digital Library
Notes:
-
OAC based on XTF xml platform that was old and fragile
-
70K finding aids from 350+ archival institutions in CA (EAD, MARC, even some documents)
-
impetus -- dying operating system
-
no config-management; no tests
-
heavily used and deeply linked by other institutions
-
ArcLight -- based on blacklight
-
(traject)
-
django admin -- airflow (data and workflow) -- arclight
-
s3 rds for data-storage
-
textract, sos, MWAS in Airflow
-
cloudfront for arclight/solr
-
they build the django admin first
-
"infrastructure as code"
-
arclight tricky for non-blacklight users
-
one-writer, multiple-readers for solr -- more complicated than expected
Release
- really rough
- solr up and down / arclight kept crashing
- solr-queries were super-expensive
- kept blocking bots, but hard
- took a week to get stable
- baseline automated deployment and testing really helped pushing out experiments for improvements
django and airflow -- opinionated and reliable and flexible
-
airflow moves toward application instead of framework
-
django just framework
-
arclight full tool
-
amy's takeaway -- much easier to modify flexible tool than customize complete tool.
-
arclight feels most risky due to lack of rails knowledge and only use of solr.
Summary:
- we heard about Apache Airflow at least three times from this conference. Our team should investigate it and file it away as something to know about for future project-discussions.
- replacement used django-admin -- they were very happy with that.
- "infrastructure as code" -- mentioned: worth thinking about.
- solr queries were expensive (me: caching? -- especially if it can be auto-updated only on new ingestion?)
- amy's takeaway --> much easier to modify flexible tool than customize complete tool
- Natasha Fisher
- Digital Archivist, The ArQuives
Summary: i'm not replicating all notes -- my general sense: that this and other digital-preservation talks are worth reviewing by ITS team to build osmosed knowledge re a repository-adjacent field.
- Peter Cerda
- (affiliation not listed)
- Aurelia Hudak
- (affiliation not listed)
- Rachel Woodbrook
- (affiliation not listed)
- (everybody from University of Michigan Library)
Notes:
- UofM acquired digitized materials from ProQuest
- Historical requests: reach out to author if possible -- investigate copyright status -- if no copyright notice, open access as if dissertain is in public domain.
- lots of legwork -- is there a way to automate that workflow?
- flow:
- extract first and last 15-pages of pdf.
- contains searchable text? scans for copyright-related phrases.
- uses fuzzy0matching -- read other two of that slide
results:
-
great!
-
ran it on 13,000
-
made over 8,000 open-access
-
identified 1,500 for further review
-
code available at deep-blue-data (doi)
-
decreased number of access requests.
Summary:
- cool talk worth ITS-folk reviewing for ideas about how to automate copyright-detection workflows.
- worth sharing gist with Andrew Creamer.
Librarian as institutional metadata steward: Case studies in Research Information Management Systems administration
- Clarke Iakovakis
- Scholarly Services Librarian, Oklahoma State University
Notes:
- Symplectic Elements -- vendor system to track research across univertiy.
- Library stepped forward and said -- hey, this dovetails with our interest in open-access and bibliographic-metadata.
- 1200+ public profiles averageing 20,000 clicks/month
- internal web-interface (vendor?)
- externally-facing web-directory
- xml-based api
- SSRS for reports
lots of data that does not come in with feeds: appointments, degrees, language-spoken.
custom jobs
- science fair judges
- datacite extracts (doesn't feed to system directly)
Appointments: no single-source.
Scopus and ORCID-ids
OCLC -- social interoperability in research support article.
Summary:
- a line I liked (paraphrased): "Our Library stepped forward and said -- hey, this dovetails with our interest in open-access and bibliographic-metadata." Point: my perspective: if we want to replace VIVO for purposes of sustainability/features -- ok, reasonable. But I'd like this framing as a counter to the perspective that maintaining a faculty-list has nothing to do with our mission.
- worth a review if we evaluate vendor products
- Melanie Walsh
- (U.Washington)
- Matt Miller
- (LOC)
- Dan something from Emory
GREAT PRESENTATION -- review this and share it with circ-folk.
Notes:
- start with inimal info like Title -- and then add something like ISBN -- and tool will grab more info.
- Post45 Data Collectinve -- Literary and Cultural Datasests -- Melanie's the director or something. -- around for 5 years.
- have 5-different "Book of Salt" entries from different datasets. Challenges of reconciling them.
- goal -- don't require lots of technical expertise to improve reconciliation.
- built reconciliation tool on top of OpenRefine.
- one-click install-app -- lauches a little internal server that can connect to open-refine.
- loc, worldcat, openlibrary, wikidata, viaf, hathitrust, google books. (gbooks best -- checkout their eval-paper) -- connects via api -- except for hathitrust, for which they grab the full data and do lookups from an sqlite db.
- bit.ly/BookReconciler
- looking for others to use it and contribute to it.
Summary:
- don't remember the details of why I wrote down the great-presentation comment. I suspect it's related to my sense of how DT has skills that could be useful for circ-folk, either to do some things for/with them -- or to convey knowledge/tools we have that might be really useful if they're not already familiar with them. (example: i'm thinking of Justin and I think others on our team having familiarity with OpenRefine.)
- Katherine “Kate” Deibel
- Systems Librarian, PCOM
Notes:
-
ARIA: accessibility R interactive applications
-
accessible semantic elements:
- attributes and properties you apply to it scripts for that helement etc
- -- firefox and safario on iOS announce group role, but others don't
-
mailto and ftp have good screenreader support -- SMS and iphone links and something else don't (JAWS)
-
newer html5 elements:
<dialog>element -- use that if you need a dialog, instead of bootstrap modal, though that's better than others.<section>element,<summary>,<details>-- stuff's complicated<ul role="list">(good)- screenreaders don't handle
<dt>and<dd>very well -- but kate says they can be semantically useful.
Summary:
- context: Kate's a long-time avid champion of accessibility.
- basically, IIRC: there are debates about whether these semantic elements help or hurt accessibility, and that a bit too-much emphasis has been solely on screen-reader support for some assessments.
- this line from my notes: "accessibility is not just about screen-reader-support -- so more broadly, accessibility should be about being able to be acted upon by "assistive technologies"."
- Robin Davis
- Associate Head, User Experience, NC State University Libraries
- Meredith Wynn
- (affiliation not listed)
NC State usability widget.
Good talk -- worth a review.
firefox dev-tools offers click-switch without changing os setting.
Summary:
- my main takeaway is separate from the actual interesting tool: NC State has a long history of incorporating -- into their project-planning -- ASSESSMENT.
- Blake Carver
- Sys admin, Lyrasis
Notes:
- passkey -- little cryptographic key stored in your password manager that replaces the password
- "admin-rights are a vulnerability -- not an enabler"
- principle of least-privilege
- HECVAT -- higher-education...C...vendor-assessment-toolkit
- backup, training, hardening, planning
Summary:
- Joe would like this -- though he likely already knows this stuff.
- though i've heard the concept before, I liked the framing: "admin-rights are a vulnerability -- not an enabler"
- Adam Cox
- Geospatial software developer, "Legion GIS?"
Notes:
- django stack -- using postgres -- their code is on github
- me: what are the licensing issues? https://library.brown.edu/create/libnews/new-library-subscription-to-sanborn-maps-for-rhode-island/ implies ours are not publicly accessible -- I'm sure LOC's are.
- suggesting "a georeferencing commons"
- principles
- georeferencing as data creation
- iterative and collaborative work
- independent stages
- embrace complex structures in the source material
prepare --> georeference --> trim
- define regions
- ground control points
- polygon masks
GREAT full-processing overview.
georeference-a-thons
Summary:
- i just love maps
- their stack: django + postgres. Me: postgres has long had great db fields for geospatial data -- has mysql implemented these? if so, in a standard or "add-on" way"? if "add-on" -- is this worth requesting that OIT do this for our dbs so we have the ability to store geospatial data for future projects?
- worth a review to understand the process of gathering together necessary map metadata to do cool things.
- makes me think we should definitely consult with frank if we have a bdr-ingestion-process for maps -- to determine metadata we might be able to ingest to potentially offer cool viewers.
- Aaron Pahl
- University of Alabama / Birmingham
Notes:
- started with logbooks to extract data into more usable format (ssheet?)
- tried 4 free-tools and 3 paid-tools
- evals: accuracy; table-structure-preserved; use for other projects
- ggl-cloud-visiion --> abbyy --> tesseract
- testing process of suite of tools made somewhat obsolete by improvements in AI tools.
Summary:
- interesting talk, but for me the most interesting thing was that last fourth of the talk, that improvements to ai-models have been so dramatic that it may well have made obsolete the testing process the presenter had been using.
- me: this is a repeated theme I've seen over the last few months, with coding and in other realms: people re-experimenting with existing ai-tools, in standard ways, are often very surprised at improved results, and folk experimenting in new ways (ie having a tool make a plan, manually updating a plan -- and then having the tool implement the updated plan), are often shocked by how good the results are.
- Gregory Wiedeman
- University Archivist, University at Albany, SUNY
- Maureen Cresci Callahan
- Director of Archives and Special Collections, University of Connecticut
GREAT TALK -- ask C&N to watch this.
Notes:
-
All archives have a DAM problem :) (digital-asset-management)
-
the DAMs info doesn't give you context that a finding-aid does
-
an epistemological problem
-
same info in three different collections
- letters from a congress-person
- records of a local communist party (same info, different context)
- personal papers of a judge who prosecuted communists
-
purpose is evidence -- not just information
-
archival data has a graph structure
- think of linked-data without URIs
-
archivists guide you to material in meaningful chunks -- context included
-
metadata already has a schema -- many times details just not filled out yet
-
inheritance often used in archival data -- "collection note" -- can be overriden
-
DadoCM -- IML grant -- goal: to manage archival description and digital objects without data duplication
- mapping
- portland-common-data-model --fileset/file...version?
-
the DAM-image is incorporated into the archival-interface
Summary:
- note my "GREAT TALK -- ask C&N to watch this."
- IIRC, my thinking: broadening all of our knowledge of EADs. eg, I know a lot about the concept of EADs, but had never heard that framing that DAMs focus on the specific digital object, while EADs focus on the context of the digital object. That's so useful.
- above is conceptual -- i'd like to review their talk re the specific ways they worked with their EADs before we embark on significant RIAMCO work.
Unveiling Boston Public Library’s Hidden Collections Using the WorldCat Metadata API
- Jay L. Colbert
- Special Projects Cataloger, Boston Public Library
- Mike Williams
Me -- another good talk in terms of widening perspective
Notes:
- Snapicat
- inspired by BookOps easy-to-adapy Python interface for the WorldCat Metadata API
- really good Inspired slide
- i think ended up on a react stack
- worldcat-api documentation lacking in some ways -- eg which bibliographic... something version
Me: what's this for? if for searching -- why not the regular ILS? Or is it for cataloging these into OCLC for the first time? Maybe that's why they added minimal metadata, maybe to do lookups. Oh, maybe it's using the worldcat api to do lookups to get additional metadata to catalog it into the ILS.
ISBNs without OCNs (oclc control numbers)
Summary:
- I missed part of this talk, I think I was trying to answer some email.
- The reason I'd like to review it is in case it's part of that idea I've had about how ITS/DT might be able to contribute to other departments (besides hay) -- if this is a tool/process to enhance catalogers' work.
- Aiden de Boer
- Student Programmer, J. Willard Marriott Library, University of Utah
- Rachel Jane Wittmann
- Interim Head of Digital Library Services, Metadata Strategies Librarian, J. Willard Marriott Library, University of Utah
- Kaylee P. Alexander
- Research Data Librarian, J. Willard Marriott Library, University of Utah
Notes:
- python-based open-source tool to run on your data to see what data can be addressed for reparation.
- reparative metadata, LCSH, and Sensitive Content -- Lexicons
- initially launched alongside the inclusive metadata tool in 2024
Summary:
- just a cool project: worth seeing if we could run it on extracted BDR metadata -- and worth letting cataloging know about.
libraryfutures.net/interoperable-ebook-standards-statement OPDS-feed -- read more about that -- could be an interesting format to be able to export to. EDRLab Thorium desktop-reading-platform ME: really worth a look. Summary:
- I hadn't even known there was such a need for ebook-spec standardization.
- I've long thought we should investigate ways to archive ebooks instead of pdfs where possible -- and have librarians work with upstream folk, like grad-school, to encourage this.
Summary: OMG, WHAT A TALK!!! 😁
- "sufficient human involvement"
- prompts alone do not provide sufficient human control
- if ai reproduces exact output -- yes, infringement
- Meta case: training considered fair-use
- Anthropic case: training considered fair-use
- OpenAI case: underway
Summary:
- good quick overview.
- why are multiple ORCID member-IDs/accounts/whatever needed for the different library uses?
- shib "Comanage-Registry" can handle ORCID member-API support
Summary:
- didn't fully understand the need he described for multiple accounts -- but worth reviewing / following up on -- if we were to enhance a get-ORCID service.
- Birkin James Diana
Summary: OMG, WHAT A TALK!!! 😁
Supporting the Hidden Work: OSS Projects for Inventory and Weeding
-
Maccabee Levine
-
Senior Library Application Developer, Lehigh University
-
if you're working with google-app-scripts -- try clasp -- helps with syncing local code to google-app-scripts
Summary:
- don't remember specifics of talk, but Maccabee's a good thinker, so i may review -- but do remember this takeaway: if you're working with google-app-scripts -- try clasp -- helps with syncing local code to google-app-scripts
- Seth Erickson
Notes:
- students can use it for a quarter -- virtual machines run inside of k8.
- "Coder" -- open-source, self-hosted for providing cloud-based development environments.
- they have a template for the form to create a VM -- changes each quarter.
- uses terraform internally for configuration.
- they set it up so you can click something which will open your local vscode and install a connector.
- users have root access into their machines.
- browser/vs-code, or coder-CLI
- "establishes a "wireguard" connection? look that up.
- he's ocnnected it to other backends, like an NSF something-or-other using "open-stack".
Summary:
- very cool talk -- worth sharing with CDS -- not in lieu of their reclaim (which is specifically for persistent storage), but as an alternative for short-term needs.
- Kyle Reiley
- Lead Software Developer/Analyst, The University of Texas at Austin Libraries
Notes:
-
classic "build it for them" site.
-
last x percent of customization results in lots of tech-debt.
-
shifted to fully custom.
-
solved the platform problem but borrowed new one.
-
shared infrastructure approach. took care of backend.
-
"portal" for frontend -- still in-process. sounds somewhat complex.
-
strapi -- when they improve search, every-site-imprves -- same for accessibility.
-
LADI
-
one of best benefits is no longer seeing old sites decay.
Summary:
- gist felt similar to our goal with the BDR-uploader-hub webapp: how can we offer multiple-separate-webapps, in ways that minimize lack-of-maintenance-decay -- and promote improvements auto-applying to all apps?
- Eric Silberberg
- Librarian, Queens College, City University of New York
Notes:
- LACLI -- free online index of latin-american stuff/collections/resources.
Summary:
- didn't take good notes -- but I think the gist was that folk can update a database -- and they have an automated system to produce a static site from it. That concept -- of programmatically producing static-sites from complex systems -- is, I think, worth thinking about -- and worth brainstorming which of our services we could do that with if we wanted to.
- Dr. J. Nathan Matias
- Assistant professor, Cornell University Department of Communication
- Founder, Citizens and Technology Lab
Notes:
-
Agnotology -- coding and cultivating knowledge in the context of...
-
mayan-language book, in spain
-
1500s priest interested in learning about the mayan language -- also opposite, organized burning of mayan books
-
mayan book saved because it had been taken to spain
-
annotology -- the study of ignorance
-
"not about blocking knowledge, but rather about establishing a metaknowledge regime that could be used to attack the whole idea..." (eg, of science)
-
citizens and technology lab informed by citizen-science movement
-
cultivating knowledge through human cooperation -- wikipedia visualation.
-
AI moderation-systems introduced in 2007-8 -- ai removes "low-quality" edits -- a decline started then
- me: how do we know that's the cause of the decline, instead of correlation?
-
reciprocity (benefits) --> indirect-reciprocity (thanks) --> upstream-reciprocity (pay-it-forward)
- his team was interested in the latter
- how to measure the effect of the wikipedia "thanks" link
-
the "door" -- like libraries and the internet, wikipedia's views are declining due to generative-ai
- problem: don't experience invitations to contribute to the body of knowledge
- wikimedia's "futures lab" site exploring this, their mission, to protect and grow this knowledge repository
-
jefferson: fierce advocate for intellectual freedom
-
reporting, reviewing, and responding o online harrassment -- WAM! org (women and media)
-
rscience -- can norms against harrassment change behavior and grow participation online?
- collect data -- coordinate intervention -- manage ethics
- posting the rules increased the chance of rule-comopliance by > 8pct points, and increased participation by 70%
- that study has been replicated 6 times by different communities
-
sicence for understanding, de-escalating, and something-else -- musk and meta have threatened researchers investigating online-hate.
-
Agnotology -- powerful folk will muddy waters, seek to control research, threaten researchers
- coalition for indepenent technology research
- CITR files amicus brief in x lawsuit t defend the rights of independent researchers
- fed judge dismissed x's lawsuit -- credited CITR
-
have looked into automated-copyright-enforcement -- interesting that I can't guess whether they're for or against that. I think against, cuz they don't want communication chilled?
-
how to balance collecting data needed to protect the inegrity of people's thoughts and speech?
-
one of the risks of AI is that it takes people out of the equation/info-flow
- meghan partner -- ex-director of human rights data analysis group -- HRDAG
-
evaluation of ai-systems:
- reliability
- transparency
- models of public involvement in ai research -- co-creators, participatory-models, others
-
5 areeas where science of ai evals:
- equipoise -- ??
- measurement needs to be accurate
- explanation -- example?
- inference -- algorithm "fair", but application not
- interpretation
-
people of both parties do trust universities -- higher-ed/july 30,000 poll
Summary:
- cool talk -- i need to review... this person's lab/center does cool work/research -- one interesting point: how to create ai-enhanced workflows that augment humans in the loop, instead of replacing them?
(not all listed -- just highlighting some)
- cool alternative to keyboards -- and tools to learn!
- also chords -- and tool examples
Summary: just fascinating.
- being the connection between people who actually use the systems
- needed: functional users, tech person, project-manager
- provide sounding board for new projects and ideas -- prioritize (ideally via agreed-upon rubric) -- flesh out work and implement
Summary: don't remember specifics, but I thought it was worth reviewing.
- emotions impact usability -- frustrations and dead-ends can really degrade confidence in tools
- gave categories of things to think about
- what's balance between conveying richness of data, without overwhelming users?
- folk looking for simplification-offramp may be more susceptible to predatory influences
- ideal goal not just to share, but to build relationship
Summary: I italicized the highlights above.
- boolean satisfiability problem
- "sat-solvers"? "stat-solver"? -- z3 from microsoft -- ah, "satisfy"
- nice example slide
- AtMost(), AtLeast(), Implies() -- variables can be numbers, not just bools
Summary: I liked this because it surfaced concepts I hadn't thought of.
Summary: this is the 2nd time I've heard the viewpoint that archivists deal with boxes-folders, whereas digital repos deal with "items"
- areas of conflict with real problems like limited funding and freedom-of-movement
- near-zero budget, can work w/o internet connection, can incorporate into bigger systems (harvest site)
- rasperry-pi -- $150.
- makes static-site
- very cool
Summary: was this the one I thought Connor would find interesting?