Updates

November 03, 2004

OAI update

There is now a way to check on the status of what has been added to the Library's OAI metadata provider using this Nand search.

September 16, 2004

OAI Metadata Provider

Metadata previously exposed for the CIC Metadata Portal are now available for browsing and searching. Try a search for "hutchins" or browse by collection (arranged by contributing institution).

August 20, 2004

Century of Progress Preview

A preview of the new digital Century of Progress collection was demonstrated at Supervisor's meeting. The Century of Progres project, jointly undertaken by staff from Crerar, DLDC, Presrvation, and the Special Collections Research Center, has produced an online searchable version of the checklist of publications from the 1933 Century of Progress World's Fair (originally created in 1937 by staff of the John Crerar Library), and has digitized a portion of our holdings (around 350 of 1022 pamphlets were digitized).

August 02, 2004

OAI-PMH Provider Ready

An OAI-PMH provider has been installed, and data have been successfully harvested from it for the CIC OAI metadata harvesting project. More details will be provided when the CIC has a front end allowing our collections to be searched.

February 04, 2004

EAD update

The code which generates frontmatter and inventory for finding aids has been updated to conform to the EAD 2002 standard.

December 17, 2003

Century of Progress Demo Launched

A preliminary version of the Century of Progress digital collection was launched in time for the release of the December issue of the alumni magizine which included a short piece on the Century of Progress pamphlet collection. The article included a link to the Web site which contains a sample of 19 digitized pamphlets. The digital pamphles are made available as DjVu files which supports zooming in and out of the images as well as easy printing or saving of the entire digital pamphlet in one step.

November 06, 2003

RLG Cultural Materials update

Images and metadata for The First American West collection have finished being uploaded to RLG. When RLG processes it, it will be the second University of Chicago Library collection represented in RLG Cultural Materials. The first, American Environmental Photographs, was made public on 1 October. Here is RLG's announcement to us of that fact.

Dear Charles and Daniel,

first contribution to RLG Cultural Materials loaded
yesterday, October 1. Your "American Environmental
Photographs, 1891-1936: Images from the University
of Chicago Library" is featured on the RLG Cultural
Materials home page under "New Collections".
University of Chicago is also now listed among the
other contributing institutions in the "What's Inside"
section.

We appreciate all your efforts to make this happen.
We look forward to adding "The First American West"
in the near future.

With warmest regards,

Karen

Karen Smith-Yoshimura
RLG

October 07, 2003

Staff Directory Launched

This week we launched a new searchable/browsable version of the staff directory and an accompanying static web page listing departmental contact information (general numbers, fax, email, etc.) and linking to maps and library addresses. The searchable database is based on data pulled from regularly-maintained Personnel databases. In addition to the searchable database, an automatically-generated PDF file with a simplified alphabetic listing of all staff is created each time the database is re-indexed. The database relies on the Nand database program which developed in-house.

September 19, 2003

Early China Web Site Launched

The Society for the Study of Early China (SSEC) web site was launched today. This web site is a collaboration between the Library, faculty on campus, and the SSEC in which the DLDC designed and maintains the site's templates as well as hosting the site on its servers and the faculty and SSEC produce and maintain the site's content. The SSEC intends the web site to be a locus for scholarly exchange and communication in the field. The site not only includes information about the SSEC but also provides abstracts of each issue of the SSEC Journals as well as publishing research papers, notes, databases, and bibliographies on early China.

Electronic Journals A-Z Search Interface Launch

Today we launched a new version of the Electronic Journals A-Z List. The list is based on an export from the SFX database which is then indexed by our mulitpurpose Nand search/browse tool.

- Search by Title or ISBN in addition to browsing alphbetically
- Browse by Database. (e.g., Catchword, Elsevier Science Direct)
- Limit on the number of search results per page so that load times are
reduced for the browse results on letters which contain large numbers
of titles (e.g., A's and J's).

August 14, 2003

Annual Report

Published 2003 DLDC Annual Report.

July 24, 2003

Installed Spam Filtering Software

MIgrated the Library mail server and installed spam filtering software which will flag all incoming messages to library mail aliases with a likelihood of being spam. Staff will be able to use this flag to filter such messages. More information is available in the Spam FAQ.

June 12, 2003

Launched new Art Page with Master's Theses Database

Replaced The Art Reserve Kiosk on /e/ with a newly designed Art and Architecture subject page. This new page points to the now productionized Masters' Papers in Art History & Visual Arts database which was implemented using Nand technology.

June 10, 2003

Productionized DLDC wiki

Set up production version of the DLDC wiki, including creating topical wikis for Nand and Unix Tips.
Translated all the material from our earlier test wiki (based on Wikit software) into the syntax of the wiki software we chose for productioninzation (Swiki).
Created Wiki for PREMIS group at Charles's request. (He is a member of this OCLC/RLG sponsored working group).

DODL Framework Document

Digital Library Federation Framework for a Distributed Open Digital Library.

June 09, 2003

DLF Meeting Notes

Posted my notes from the DLF Spring Forum on the DLDC's staffweb pages. They are available at
http://www.lib.uchicago.edu/staffweb/depts/dldc/notes/dlf/dlf-03spring-day1.html
http://www.lib.uchicago.edu/staffweb/depts/dldc/notes/dlf/dlf-03spring-day2.html
http://www.lib.uchicago.edu/staffweb/depts/dldc/notes/dlf/dlf-03spring-day3.html
These notes are posted as taken so you may need to ask me for further comments on items of interest. For descriptions of the topics of each session see the DLF Spring Forum Schedule.

DLF Steering Committee Minutes

Minutes of the Meeting of the DLF Steering Committee

June 06, 2003

Cultural Materials

Told LC location of AEP2003 metadata for pickup. Gave RLG AEP2003 metadata, and AEP2003 thumbnails, galleries and access copies as a bzip2 tar file.

June 04, 2003

United States Newspapers held by the University of Chicago Library

Productionized the catalog: http://www.lib.uchicago.edu/e/su/news/usnewsp/.
The link to the searchable NAND database is at the bottom of the page.

June 03, 2003

Historical Documents from Northern Italy

Unveiled the catalog: http://www.lib.uchicago.edu/e/spcl/findaid/rosenthal/.
The link to the searchable NAND database is at the bottom of the page.

Short Title Manuscript Catalog

Productionized the catalog: http://www.lib.uchicago.edu/e/spcl/findaid/stc/.
The link to the searchable NAND database is at the bottom of the page.

June 02, 2003

Century of Progress

Productionized the checklist: http://www.lib.uchicago.edu/e/spcl/findaid/century/.
The link to the searchable NAND database is at the bottom of the page.

May 30, 2003

Archiving

/data/web/storage/pres and /data/web/storage/spcl together have 336 GB
(or 1/3 TB) of space available. 33 MB (1% of available storage) is
currently in use.

/data/web/storage/arch has 414 GB (or somewhat shy of 1/2 TB)
available. 280 GB (41% of available storage) is currently in use for
the "TIFF farm," and the Annex and StaffInfo archives.

May 29, 2003

NSDL 19-21 May 2003

NSDL: National STEM Digital Library. STEM: Science, Technology,
Engineering, and Mathematics.

Lee Zia presented a progress on NSF's NSDL at the CNI Spring 2003 Task
Force Meeting. "To date three sets of grants have been made in three
tracks: 1) Collections, 2) Services, and 3) Targeted Research."
http://www.cni.org/tfms/2003a.spring/abstracts/PB-NSDL-Zia.html

I participated in one of 15 panels reviewing the current round of
grant proposals. We considered 10 proposals, 9 in the collections
track and 1 in the services track.

Participating was a good experience for understanding more about the
NSDL and how we might go about crafting a proposal should that become
something we (the University) want to do. I learned something from the
other members of the panel, and they learned something from me, a
librarian who does digital library development.

DLF 14-16 May 2003

Official summary: http://www.diglib.org/forums/spring2003/DLFForumMay03rev.htm

My notes follow.

Developers Forum.

the experience of those who have built them is that portals don't
attract users. (the rest of my notes from the developers forum are on
paper, because i found i couldn't look at others, talk to others, and
input into my handheld at the same time. i need me one of them
notebooks that can read real handwriting. note to keith: using
graffiti is still too slow imo.)

Opening plenary.

From the official summary:

Professor James Boyle of the Duke University Law School, speaking on
"Public and Private Initiatives in Copyright Reform," declared that
copyright laws restrict access to almost all of twentieth century
culture.

My notes on this and the rest of the DLF Forum follow.

lifetime plus 70, or 95 if work for hire

the farther you go back the less value of the copyright protection

it's economically irrational

1. 98 percent of works have at most five years of commercial life and
most not that. ten years out moves to 99 percent

napster kazar morpheus

copyright was there to promote the spread of knowledge
the system worked to inhibit unfair practices among industry rivals

2. all work is copyrighted the moment it is fixed in material form

james boyle, creative commons
the losses from failed sharing
creative commons a second-best solution
www creative commons org
has a checklist, attribution, derivative works, commercial
non-commercial, etc.
creates a lawyer-readable license
then has a commons deed, which has a human-readable version
also has a machine-readable version
''i want pictures of the empire state building which can be used w/
attribution''
launched these licenses in december
mit w/ common courseware will be using it

archival material or material which we ourselves generate

3. losses from misunderstanding of fair use

has openneses and vaguenesses

need to exercise fair use because fair use can be exercised only when
someone is not losing profit by the activity

www chillingeffect(s) org

aaup

contingent evaluation (made-up figures)

publishers provide a credentialling mechanism for which they do not
pay

sunsite --) ibiblio

to what extent do licenses bind third parties? unclear: in some cases
it is and in some cases it isn't. contract law is increasingly moving
to say it is

stress academic freedom and free speech

=====

breakout session 2

"The Bibliographic Enrichment Advisory Team." David Williamson,
Library of Congress

links from marc 856 to onix toc

use prime ocr 95% accurate

200 hits / hr on dtocs

these files on the web are being indexed by goodle and yahoo

google does not index m'data tags because of the porn sites

there's no way to format that text as a marc field

exact same record as they're sending to amazon

---

OCLC Metadata Switch." Thom Hickey, Chief Scientist; Jean Godby,
Research Scientist; Diane Vizine-Goetz, Research Scientist, OCLC

reusable models

REST, SOAP, etc.

web services (technical term) for the digital library:
register
search
resolve
navigate
decompose (ddc numbers, the name in a list, etc.)
enhance
transform

in the genre of science fiction what is the most common location?
(turns out it's mars)

... and combines genre and subject information

GEM

(godby has a paper on record transformer; ask her for it)

---

eprints in cornell's archive are not peer-reviewed because so many of
them are published in peer-reviewed journals
disciplinary repository

the work we do in the univ of bc is archived by the library
circle of gifts
doesn't need to be archived everywhere

---

"RedLightGreen." Merrilee Proffitt, RLG

john udell's library lookup site uses isbns

citation building format e.g. mla

prefers the dtd approach because schema uses attributes which they
don't index

[see my CNI 28-29 April 2003 notes for the rest of the presentation,
since it was presented there as well]

=====
PLENARY 2

"The DLF Today and the Case for the Distributed Online Digital Library
(DODL)." David Seaman, Director, Digital Library Federation

[David Seaman began by talking about some of what the DLF does/has
done.]

digital formats registry

registry of digital masters w/ oclc

jewell e-resources mgmt--xml format for e-licenses

cataloging of visual resources guidelines in draft now

tei for libraries guidelines: version 2

production workflow good practices
.. workflow designs
.. filenaming choices
.. lessons learned
.. mgmt software used or developed

survey of digital production tools

db- vs text-based xml delivery tools [need to give guidance on these
for the community]

initiatives can just take off
the DLDC can fund meeting and travel conference calls publications
not a democracy, couldn't work
that's what dues fund

[He continued by talking about the DODL.]

gives the library directors something to do

the immediate challenges are emotional conceptual and organizational

the focus is on service to support scholarship and teaching

the hope is that this will drive new content building

capital fund, money put aside for large strategic project

executive summaries of initiatives such as mets

there's no reason why a large part of this should not be systematic
study

=====

breakout session 4

"FEDORA Digital Repository Implementation at UVa--How, Why, and What
We're Doing with It." Leslie Johnston, Director, Digital Services
Integration, University of Virginia Library.

[this talk inspired me to look closely at Fedora. it seems to address
needs that we've already identified as being things we need to start
working on now, and uses a selection model that we've already
identified as being the way we prefer to work here.]

fedora
the first release is this friday, 1.0 w/ documentation
mozilla public license
www fedora info
the fedora architecture is based on object models
objects can be simple or complex
metadata inline or not
behaviors
have code objects as well

disseminators are containers
(buckets)

fedora uses mets objects while retaining files in their original
formats
communication and public relations dept [at virginia did the] graphic
design

what formats need to be delivered on the fly

get static view get dynamic view

tei ead dc vra core

uva desemeta
(merrilee's rlg format, event based)
gdms is how they represent collections of objects

gdms is a tool to create fedora objects

uses tamino (for EADs) and opentext

perl xslt mrsi

automatic tiff to gif

second step will be atomistic control at the file level

the same object can have multiple parents but new notion is to have a
primary parent. also part of phase 2

repository that has been selected for collection

subject librarians and user services librarians, usability group in
communications and publications

looking at ipedo as an alternative to opentext

---
The University of California's Collection Management Initiatives:
Findings on Use of and Preference for Digital Journals." Gary
S. Lawrence, Director, Library Planning and Policy Development,
California Digital Library.

new issues popular in print format [note to self: but perhaps can
reduce binding budget]

eighty percent of respondents said that backfiles dont go back far
enough

results entirely consistent w/ outsell's finding that office and home
use dominates

ten or thirty three to one in favor of digital

shared print archive elsevier and acm

the reasons for preference for current issues in paper will be looked
at -- habit? age?

relied entirely on vendor-supplied data
---

"Cushman Exposed: Exploiting Controlled Vocabularies to Enhance
Browsing and Searching of an Online Photograph Collection." Michelle
Dalmau, Interface and Usability Specialist, and Jenn Riley, Digital
Media Specialist, Indiana University Digital Library Program.

expose structure to facilitate browsing

date genre subject location and combination of categories

integration of thesaurus search w/ database searching

some people want to browse from broader to narrower and others the
reverse

only one third of thesaurus terms used, an implementation problem

dynamic search and browse

late summer early fall launch of cushman collection

lead-in vovabulary
thesaurus mgmt issues

the ui may mask the controlled vocabulary structure
not interrupt normal browsing behavior
-----

breakout session 6

A Registry for Digital Format Representation Information." Stephen
L. Abrams, Digital Library Program Manager, Harvard University
Library; MacKenzie Smith, Associate Director for Technology, MIT
Libraries.

global digital registry format

identification
validation
transformation
characterization
risk assessment
delivery

ingest sip validation sip to aip transformation [pronounces "aip" as
"ape"]
access
preservation planning, sip to aip transformation

mime types
insufficient level of detail
granularity--too coarse, eg tiffs compressed in different ways
data and governance models

pronom--public records office uk
diffuse in europe, it's a website

ietf media features is a follow-on to the mime mechanism content
nedotiation betwen client and server

owner maintainer identifier name alias taxo[nom?]y / ontology typing
subtyping eg svg a subtype of xml

registry service,register interest for updates obsolescence new tools
etc

executive summary for money

format typing mechanism at the approprtate level of granularity

hul harvard edu formatregistry

---
"Strategies for Implementing Preservation Metadata in Digital
Archiving Systems." Rebecca Guenther, Library of Congress

PREMIS

a strategy for finding strategies for preservation metadata

dealing w' aip

content vs preservetion description information

60 elements
what's a minimal core
automatic generation
apply by object type or object behavior
more practical view, best practices document

core daa dictionary
format for recording (xml schema) and pilot programs

lack of common vocabulary
significant attributes

www oclc org/research/pmwg

[See also notes from last CNI, Priscilla Caplan's presentation]

---
"PDF/A: A New Digital Preservation Format." Bill LeFurgy, Digital
Initiatives Project Manager, Library of Congress

the goal is to have pdf/a accepted as an iso standard
up to two years to approve the standard

needs of document producers
.. ease of creation, fits neatly into a workflow, should be flexible
needs of the users
.. easy to search, ease of discovery, getting an exact appearance
of the original document
contention
requiremets for archical repositories or others that might be
maintaing documents over time
.. no proprietary formats
should work well tomorrow
homegeneity
support m'data as far as you can for discovey provenance preservation
activities
how to make that as easy and painless and possible
want to support required significant properties

embedded fonts no encryption standard color space limits on compression

xmp extensible m'data platform to associate m'data w/ object

xmp introduced w/ acrobat 1.4
dc medium mgmt rights mgmt
has broad extensibility
can embed m'data w/in the binary file but shows up as plain text

still looking to see where xmp is used in real world
cant do data typing or validation
cant automaticalyy compare schema w/mdata for validation
thinks some kind of partial validation would be possible

stephen abrams is involved

the default output of acrobat may be pdf/a according to an adobe
engineer

pdf as a page description language vs xml which divorces presentation
and content

[in response to why pdf and not xml from the floor:]

the courts were not comfortable that xml could handle all the
formatting needed
also xml is less easy to implement
need absolute assurance that formatting will be one hundred percent
the same

[see also notes from last CNI]

=====

[I had lunch with Leslie Johnston after the Forum about Fedora. here
are some notes.]

pid system? -- not handles, but can use an external handle server to
register the pids.

how best toiget going w/ fedora?

talk to northwestern they and tufts farther along.
go to the fedora info site will have all the contavcts for deployment
partners

cornell is participating in the coding effort and will be for the next
two years
next year access cootrol

havent implemented controls for primary and secondary parents, the
kinship metadata [cool concept which came out in the presentation:
documents can have multiple parents, so can be linked to from multiple
places, but want to be able to indicate a primary parent, to indicate
the original context of the document]

system in java but supports disseminators in perl

solaris
wayland, russ local lead implementor to talk about technical details

tufts oki apis for learning mgmt system

can use whatever metadata thing we want
ead tei gdms and vra core not yet [i believe the "not yet" applies
only to vra core]

the vra core does not have good encoding guidelines and will change
soon: [will] add guidelines and constraints

1.25 fte code support and disseminator--java programmer
other does ingest files conform to standards: librarian w/ system
librarian user support background

fedora info to get the software

=====

METAe
the metadata engines product
automated conversion of printed documents into fully tagged xml

claus gravenhorst

stu schneider

content conversion specialists
docWORKS/METAe
product launch march 2003
aug 2003, ifla berlin

software for use in libraries and service bureaus
biblioteca statale a baldini italy

docWORKS engine
image preprocessing
layout analysis
character recognition
structural analysis

output: tiff mets alto jpeg
alto: physical representation of each page

has mdata for the various elements of the book

can link from logical to physical structure

univ of innsbruck is hosting the alto standard

claus.gravenhorst at ccs-gmbh.de

---
[after the METS portion of the meeting:]

can put a wrapper around it saying that it is not xml for descriptive
metadata

can output to other dtds

ccs ruleset
the engineers adapt the ruleset to your needs
in 2004 want client side rules adaptation
abby ocr
fraktur yes but gothic fonts will not work

one operator can handle 5000 pages a day

get samples at beginning of a project

quicktime to mets structmap for video files

the lc mets sheet music page turner guy guy seems related to fedora

nyu mets creation tool
useful for small scale production
mysql plos peml apache front end
sourceforge

300 pp 45 minutes on fast pc
but can have client server scenario

greyscale
integrates spell checking
wants to partner
tiff images 300 dpi ftn greyscale

-----
METS update

the behavior sec is now recursive
june/july fedora 1.1 will support this
technical m'data for text--didnt catch name

mets:: ims, scorm
just need to make sure that they interoperate

profiles: extension lang and controlled vocabulary

more exx inc tei

mets: standards process
fast fast track for registering

mets opening day
who when what
aiming at dc oct 27th-28th

half day intensive introduction to mets
half day for people already implementing it
half day hard core for developers where do you start
the practicalities

nancy's rights schema
not intended to be actionable so content guard wont sue us all

object types vs profiles
it's a mgmt distinction
object types can be used to class profiles

will be a public registry for prfiles
will have a profiles xml schema

may have generic profile one for page-turned objects

nyu website: xslt for html page-turned objects
sourceforge site xslt suite for mets objects
rick moa2 disseminator, converting that to work w/ mets, page turning
part is done, the descriptive part is using mods, not yet finished
rlg will make their viewer publicly available
lc sheet music page turner
in conjunction w/ tei encoding
(eric? lc)
fedora does tei page turning, chris wrote it

can link it appears from mets to mpeg7 because the hooks are there

consult back and forth on what works and what doesnt nyu and indiana
-----

"Robotic retrieval and scanning." Sayeed Choudhury, Johns Hopkins
University Library.

a data capture framework and testbed for cultural materials

hopkins
built a remote shelving facility but wanted to be able to browse it
telerobotics
''computer system surgery''
has conducted surgery in singapore
constrained the problem
CAPM supplementing the existing system physical delivery

can use different arms
hyper redundant systems
eg robotic snakes
unanticipated uses
simulated the user experience not just ask them a question
separate cost and benefits team
two to thirty dollars per use cost
sixtyfive dollars benefit from semester
levy collection of sheet music is at hopkins
institute of museum and library services imls

adaptive optical musical recognition software
if makes the same mistake repeatedly
can teach it
guido is the representation
output is midi, mp3
is this a happy or sad song
was the melody reused in another piece
tore it down and built it up more modularly and called it gamera. its
a framework which includes classifiers etc
medical [f]rench ancient greek early modern english
uses python will pop up code in a separate window for manipulation
perseus tufts computational humanists

nsf information technology research project
tactic robotics: robots that feel
92 percent recognition rate
parc hidden markov models to go from local to global recognition

dke jhu edu/
CAPM
gamera
---
"The Preservation Productivity Paradox in the Modern Digital Library."
Catherine Aster, Stanford University Libraries, and Stuart K. Snydman,
Digital Library Projects Manager, Stanford University Libraries.

two million pages in 24 weeks
swiss startup, fordigital books

scanning at 300 dpi greyscale

1160 pp/hr theoretical max
650 pp/hr --real throughput

have metadata entry capture templates for different kinds of
marterials eg books journal articles etc

uses barcode (i think) to create filenaming

workflow includes derivative making and ocr'ing

abby and prime ocr, that does 99 accuracy which is enough to create a
searchable index if image will be presented to user

descr and technical mdata is uploaded into a relational db
strips tiff headers

processing ten to twelve books per day
5mb each
lzw compressing all master tiff images [i told him afterwards about
bzip2]

dpi is determined on a project by project basis

1800 pp / hr if no scanning

www-sul stanford edu white paper
kitoss [sp?] is a competing system

theyre scanning two-up and page splitting

can reorder pages and record it in the metadata
ims is content packaging standard and lom is metadata
xslt stylesheet brian tingle of cdl

amazon has a web services front end

yee has a software tool, javascript i think, which allows users to
recombine objects from different data feeds

simplifying assumption

ideally wants the universal canvas where can drop all kinds of things
in

david greenbaum and david yee
---
"Schema-Driven XML Editor for Metadata Capture." Stephen L. Abrams,
Digital Library Program Manager, Harvard University Library

xml used as encoding form to record the metadata
concentrates on descriptive metadata

generic tool
hide the xml from the cataloger

schema directed editor
will adapt its behavior to conform to a given schema

Xorro: a schema driven editor

were assuming data centric documents, ie fielded input rather than
full text

first youre asked to select top level elements

serializes to valid instance document

swing client or building swing based guis

SOM schema object model jdom and xerces

name groups and global attributes
namespace qualified

bil: biomedical imade library
ted: templated database, built around tamino
used for anthropological reports

aes audio engineering society

1.3:1 java apis
will be made available under gnu license, open source
can be deployed client side or server side

needs sax parser they use xerces
need to add documentation

off the shelf xml tools dont hide the xml

stephen_abrams at harvard edu

watch diglib [for announcement when tool is ready]
or email stephen to register interest
---
"The OAI Static Repository: a file-based approach to exposing metadata
via the OAI-PMH." Herbert Van de Sompel, Los Alamos National
Laboratory, Research Library

[will make the slides available on his website]

CNI 28-29 April 2003

opening plenary:

J. C. Herz, principal of Joystick Nation Inc.:
"Everquest for Knowledge Workers: What Organizations Can Learn from
Online Games and Other Social Software"

This was an extraordinarily good presentation (some others thought
otherwise). The speaker was able to use the approach she had presented
consistently when it came to time answer questions from the audience.
For references to her published work see http://www.cni.org/tfms/2003a.spring/plenary.html.

everquest 450,000 people, pay sony 12.95 a month to do this

questions information as a fungible
engineers like well-defined problems
information is implicit, is in people
it's in a constellation of knowledge that's unique to that individual
writers and teachers have a gift for making information explicit
we need to find ways to make implicit information explicit,
cannot abstract it out so easily because exists in a social context,
''the mesh''. engineers like to talk about it in terms of bandwidth
[and] servers
believes the value is the context
the systems that create that context is where the real value
lies. mentioned libraries and curators
google search link ranking but links are put there by people, so are
really harnessing the collective efforts of many people
metadata: tell us what you have we'll tell you what you've got
social software: software that supports group interaction, instant
messaging, weblogs, wikis
groups are different than lots of individuals
email is not social software, because targets individuals not groups,
the cc line changes it
(craig) sharkey one of her colleagues
groups of neurotics would thwart efforts to heal them finally because
they didn't want to threaten the existence of the group
weblogs have a temporal context, unlike the average web page
unlike a home page has a link to source, so the group faces out
together rather than facing inward
create a kind of edge awareness for the group
but also have monitoring of other groups' information flows as well
so know who the players are
wikis: can track back and remove.
persistent stores of knowledge which accretes. that thing stays there
as an artifact of the process
a bunch of people sitting around chatting becomes very
performative. flaming, informative, etc. native to the structure of
communication. not the domain.
average weekly usage of everquest is 20 hrs per week. doesn't stay
still. people want to go back.
lots of information referencing each other. lots of links create a
mesh. that's where the value is. can start to chart the stuff and
really start to measure the (...) of information
not applicable to all fields, but to fields where time is an issue.
one positive attribute is either cheap or free
wikipedia
post then process. can correct the information later
''my readers know more than i do'' and that's a good thing
it's all about leverage
the speed of exchange
vs the static world
dynamic attractors -- people who can attract others to their way of
seeing the context, show how you can put things in context
512 inbound links creates authority
the archive: scientific archive
we'd rather have it fast and can review it sooner. 512 inbound links
is peer review
making more strides in the left-brained fields
w/ humanists harder, because less measurable and more inward looking
the strongest network, someone will soon get tenure for a blog, can
measure the strength of the blog
knowing vs editing
books freeze the process at some point
the dead tree has a role
but the stream has a value, the electric network of human interchange

when you share knowledge you're not losing power; you're gaining
leverage, because the person who puts the flag up becomes the leader
control vs illusion of control
have to be sophisticated enough to use it. putting the information out
there and then become a hub
the group is its own worst enemy
when you set up these persistent environments
even though they're customers, the structure of the medium creates a
feeling of citizen-like entitlement for the participants
18 pages into the book online want the tree
thinks there is a complementarity btw copyright and fair use
stealing attention from the journalist's original article and the
context in the blog, but if it encourages people to go to the
original, they're out of their minds [if they don't allow it] :-)
groups of editors are always contentious
the core group on a wiki should cohere or fork, because the internal
core group serves as the fire brigade
these modalities encourage political participation
trent lott went down because of blogs
news cycle goes into the past
but a blog allows things to accrete
kept it alive until went back up into the mainstream media
also on the right: christian conservatives are blogging
it does build political traction
the field tilts towards transparency
kellogs crackle k
top link on google was about the recall
says the links are explicit
there's something which persists
the things we value are beyond technology, e.g., the authority of the
individual
its the authority we care about, not the mechanisms of establishing
it.
mentioned institutional costs (in current ways of establishing
authority)
most of the value you just described comes in from the edges, starts
pushing the center
you can't stop because of the slowness of the process
communism :: metadata
explicit vs implicit metadata
if i link to your blog or mentor your paper
keep the ecosystem healthy
can't ultimately preserve the mesh in amber, and doesn't think we
should, the preservation issue
dead hill--(i think) my readers know more than i do
teaches a graduate class (where?)
let the students build a blog or wiki for the class
the group is important. collusion is a bad world because it
violates the industrial process
can expect or allow them to do it collectively, like shipbuilding

=====

RedLightGreen -- RLG
Merrilee Proffit presenting (very nice job which included quicktime
videos of students being surveyed)

effort to put the rlg catalog on the web as a free resource
rlg catalog 45 million titles 100 some odd million records 23 years
from oai dublin core catalog to a web resource
needed a target audience
informed enthusiasts but not (professional) researchers
rlg's advisors said oai would not be popular for this type of resource
re-envisions the library catalog as an intuitive web-based research
tool tuned to undergraduates' search behaviors which meet their
information needs
google-like simple search box, also an advanced search, but not the
first thing you see
mercury example (the planet, the car, the god, etc.)
undergraduates are busy and place a premium on their time
maps [for many meant] maps to library
scores sports

students are savvy and discriminating about what they would find on
the web
must build a bridge to their universe

frbr comes in because don't understand notion of edition in initial
stages of their research
work and expression are collapsed in title cluster; includes related
works and adaptations
manifestation is an edition
item is item

version 16 of modifying lc marc to xml dtd for this project for
loading it into db2. removed 2000 elements

juniors or seniors in soc sci or hum and had to have written two or
more research papers 15 pages or more in length
user testing conducted in april and may 2002
undergraduates really interested in journal articles
browse option would not be used because uncertain whose point of
view it represented
ILL not popular because unknown or too slow
my bibliography very popular
standard citation formats [wanted for sources in the catalog]: "it
takes me twice as long to format the bibliography as it does to write
the paper"

[students in this study were drawn from] stanford and santa clara at
least [i.e., i didn't catch the rest]
the students were paid for their time
the study was done at rlg

expecting it to go live by july or august
create an account, home library, choose a bibliographic style etc.
use number of copies to test relevance
the most recent english language edition displays first
john udell's library lookup site
tries to catch spelling errors
pilot phase aug-dec 2003
columbia nyu swarthmore [are in the pilot phase]

www rlg org / redlightgreen/
mgp at notes rlg org

[in response to questions from the floor:]

wireframes done in s'thing like photoshop. fooled people into thinking
it was a product [the intention was not, however, to fool; it just
happened]

title cluster does stay w/in same author [so not a superwork in
svenonius' sense]

the local catalog is optimized for known-item searching

doesn't work w/ sfx because would obscure the user behaviour from them

might seek corporate sponsorship or small institutional contribution
CLiMB
wants to develop a set of tools that can be opened up to the rest of
the community

=====

PDF/A
pdf archival

aiim and others to define characteristics of an archival pdf file
libraries strongly represented in this group
international standard

[PDF/A is needed by the courts system. a representative of this
system, stephen levenson, presented first.]

engaged in it for seven years
need accurate rendering of documents for the work that they do
e.g. page references
but word processors change pagination if you change printers etc.
docketing :: indexing
complaint response summons etc. [what the legal system needs to track]
pdf is not a proprietary adobe version, has a spec.
corel's version 9 pdf could not be read
version 10 is still large
keep things a minimum of 20 years
then to nara, which keeps it till the end of the republic
use filters to make sure they can read them, stop executables
hope with pdf/a to put those tools on the desk of the creator
decided to go with an iso standard
decided could not standardize on a particular word processor's format
keep records to protect citizens' rights. don't want to lease that to
microsoft because of the need to "reup"
electronic paper emulation at first
color transmits value--"the part marked in red is what i disagree
with"
adobe will respect the core, the pdf/a core
gov't and pharmaceutical are co-chairs
dentists' records, retention
pdf/a defining future technical standards
pdf/a take font definition and stick it in the file because not all
fonts persist
australian victorian ...
xml mdata and pdf for their files
rolled their own
pdf/x iso 15930 is their model
an exchange model read by printers
can render color accurately
time life only accepts pdf/x
want this supported by reading devices of the future
the mgmt of electronic records is the mgmt of m'data
not much there today in pdf but will expand it
media is not included
(indecs mpeg21)
pdf/a based on 1.4 pdf
envisions save as pdf/a
www aiim osg/standards
stephen levenson
-----
william g lefurgy library of congress
aiim and npes

fonts make documents not self-contained
pdf 1.4 offers an xml-based type of metadata capability
completely published description of format [but]
adobe drives the specification [and]
no assurance that they will continue to make the standard public
flexible so can contain executables e.g., javascript
so want to make it an iso-owned standard
stable subset of pdf
embed the fonts, standard color spaces
exploring use of xmp for metadata
descriptive schema based on a subset of dc and is extensible, based on
embed an xml packet so visible as plain text, and parsable as such
based on rdf/w3c recommendation
xmp sdk is open source
some vendors already use it
the utility of rdf is up in the air
cannot do typing w/ xmp
a collaboration of four technical committees in iso
will go in front of them in september
niso will be [...] way to get comments to the group
would take about two years if no problems
but pat harris foresees bumps w/ four groups involved
steve abrams of harvard will also be involved
the library world will have good representation

=====

Enhancing Interoperability
between Digital Libraries and Educational Technology
via XML Crosswalks

Raymond Yee
Technology Architect, Interactive University Project
University of California, Berkeley

xml crosswalks
scorm
mets to rss because used by weblogs
libraries, educational technology, weblogging
IMS content packaging and IMS metadata

RSS0.9x and 1.0

moa2 to mets

rss aggregator, one newsfeed is his weblog
generates an rss channel from his weblog

ims, scorm educational technology

openoffice.org, its file formats are native xml formats

rss is not a flat structure

mets to rss is lossy
opml outline processing markup language preserves the hierarchical
structure

web opml for directories and outlines

mets
header
descr
admin
filesec
structmap
..div
..fptr
ims
manifest
metadata
organizations (structure)
..organization
resources (files)
..resource identifier

rss (doesn't allow for recursion)
channel
title
..descr tags for channel
item
..title
..link
..description

opml
head
body
.. outline

simile
semantic interoperability of metadata and information in unlike
environments

UN translator approach to the many to many translation problem
russian to chinese

mackenzie encourages thinking about reusability of schemas

rss nothing profound but has incredible traction, can ignore it or
just get in there

mackenzie: might think whether can put rdf somewhere there in the
middle

w/ audiovisual materials, might have harder time mapping to ims w/
some tags

weblog information
whose weblog
what it links to
date
who wrote it

berkeley moa2 comes out of a database so all regular

journey to topaz
a jar of dreams
uchida became a children's author

[he writes in python]

in open office has not found a way except to embed images
raises intellectual property issues

mets document or scorm object

will do ims to mets, that's a use case that's important

demo'd the scholar's box

crosswalks on his website and on the weblog
refs will be put up on cni site

=====

marc in xml preservation metadata
there is a marc21 to sgml dtd
it's still up on the website
marc xml takes a very slim approach
2709 marc 21 marc classic
marcxml is round-trippable w/ marc 21 w/o loss
can edit the xml
can easily craft interesting displays

marc xml bus
have marc8 character set mappings into unicode

mix: metadata for image exchange

mods: not completely round-trippable

rich m'data format but simpler than marc
semantics richer than marc
particularly had electronic resources in mind, complex digital objects
that had relationships to express
people wanted something richer than dc but not so rich as marc and
wanted language-based tags
mods 2.0 now available
want to be able to express citation information better
revisions in the next month or so to accommodate that and a few
corrections, enhancements, compatible w/ earlier mods records

mods instead of moa2 gdbm?

minerva, lc's web archiving project
webarchivist.org suny institute of technology

lc uses mods for most mets packages that accompany electronic
resources
derived from marc record in catalog in some sense

cni lifecycle approach to the project

PREMIS: preservation metadata implementation strategies
pres m'data a subcategory of adm m'data

it does specify rights mgmt info

oais: iso 14721:2002
wanted to archive space data

as we road test oais find it might not be perfectly suited to our work

[priscilla is a good model for how to present potentially complex
information understandably; her intellectual approach to this area
might lead to something significant if it's not subordinated to other
concerns]

preservation mdata assoc w/ aip
fixity information, like a checksum

[present these slides to the mdata group]

oclc and rlg thought it was time to move after a year of review

the charge of premis

there is a great variation in what archives ingest

different volumes of ingestion

need to support migration emulation etc.

interesting to see how one standard [might] do [it] all

managing these at the file level not the logical level

significant attributes, notion slide, coming out of the arts
commmunity when dealing w/ digital. arty but also coming out of other
communities

e.g. timing could be a significant attribute, sts to find out have to
ask the author

[PREMIS] has been scoped to cultural heritage institutions

practical problems ''the most interesting slide''

May 28, 2003

CNI 5-6 December 2002

Here are some highlights from last Autumn's CNI meeting.

opening plenary cliff lynch:

digital rights management: nonsensical and cynical at the same time,
because fundamentally about restriction, not enabling

look at creative commons: unlike digital rights management technology,
which tries to restrict use of digital works, creative commons is
providing ways to encourage permitted sharing and reuse of works
(creative commons dot org)

institutional repositories and learning management systems are related

digitized course catalog loses historical record: a strategic problem
through lack of planning. problems like this repeat.

at the december portland fall 2003 meeting, want one faculty member to
describe work at breakouts from a faculty-centric perspective

we need to think about metadata in another way, created curated used
used indecs-like definition of m'data
used ''assertion'' [about metadata]
(compare <indecs>: "The principle of Designated Authority.
The author of an item of metadata should be securely identified." (2.3)
That is, not only resources, but metadata about those resources, need
to identify their creators. See also
The A-Core: Metadata about Content Metadata and
AC - Administrative Components, from the DCMI
Administrative Metadata Working Group)

Z30.50 next generation (zing) technology refreshment
shouldn't throw out analysis that has gone on

benchmark database for images similar to trek for text
amico has a new home at the university of toronto

authorization authentication and access: shibboleth

institutional repositories are important

documenting rights: there will be a report up on that shortly
(''creative commons'')

[need a] privileged place in cyberspace for teaching

computational m'data to extract m'data from works

the ''wired'' classroom is now potentially wireless

our concerns as creators of content are very different from commercial
concerns

DRM: thinks it's a misnomer because intended to allow content
producers
to have continual control over their content [while allowing for an]
''acceptable user experience''

appropriate access tied to people not devices

the industry wants to control, while research and education wants to
establish ownership

indecs basis for mpeg 21
rights expression languages
can't program fair use

e-books and distance education might be good arenas for drm; teach act
may need narrow drm channel

odrl and xrml have no data model and are insufficient; indecs has a
data model

oasis working w/ xrml

in re: special collections kinds of rights management issues: create
ad hoc metadata scheme knowing that will try to crosswalk in a few
years. minimize the variety you have to document. look at creative
commons. spcl kinds of restrictions can't be automated easily. avoid
gratuitous complexity.

EAD
nancy fleck and michael seadle michigan state

nancy is head of technical services; michael heads the digital and
multimedia center (among other things)
national gallery of the spoken word
useful metaphor in presenting digital sound, nat'l portrait gallery
ead collection vs individual item access
description contains interpretation, for example, who we think is
speaking: mckinley might have written a speech but not delivered it

vs treating them as monographs, would flood the catalog artificially
a sound recording is more like a journal article in size
collected speeches of would be the major organizing principle
maintain the gallery metaphor
can have links to images sound files and description of who the
speaker was

use of encoding analog attribute which can go into any tag and
say which marc field it goes into (for ead --> marc crosswalking)

[from here on out my notes on this and subsequent presentations
consist of more technical detail than might be useful here]

May 27, 2003

DLF 4-6 November 2002

institutional repositories: "more useful as a way to talk about things
than as a technology"

This forum has been summarized on the DLF site:
http://www.diglib.org/forums/fall2002/dlf-fall02summary.htm

TEI 12 October 2002

One of the most interesting presentations was by Chris Turner of the
LEADERS Project, University College London. Here is a description from
the XML4LIB archives, which is fuller than my notes.

Soon after this presentation, the third LEADERS progress report was
issued; it contains the following, which summarizes what I heard at
the TEI presentation, and which particularly caught my attention at
that time.

"... we have also identified the need to build models and rules that
can deal with structures and features such as overlaid data, textual
and numerical data presented in complex tables and the presence of
formulae and mathematical expressions within the text. Data within
archive documents can be described as overlaid when an underlying
layer of data is used as the basic structure onto which further data
(other layer(s)) is applied. The underlying layer of data is usually
printed, and the overlaying layers are usually handwritten on top pf
the printed structure. Such structures and features are often found in
archival material (particularly administrative archives) but the TEIs
current encoding scheme will need to be developed if they are to be
comprehensively dealt with. The team are looking at a variety of
examples of these structures and features across a range of documents
from the UCL Archive. Rules and models for encoding are currently
being formulated and tested by the team." (2.3)