There is now a way to check on the status of what has been added to the Library's OAI metadata provider using this Nand search.
Metadata previously exposed for the CIC Metadata Portal are now available for browsing and searching. Try a search for "hutchins" or browse by collection (arranged by contributing institution).
A preview of the new digital Century of Progress collection was demonstrated at Supervisor's meeting. The Century of Progres project, jointly undertaken by staff from Crerar, DLDC, Presrvation, and the Special Collections Research Center, has produced an online searchable version of the checklist of publications from the 1933 Century of Progress World's Fair (originally created in 1937 by staff of the John Crerar Library), and has digitized a portion of our holdings (around 350 of 1022 pamphlets were digitized).
An OAI-PMH provider has been installed, and data have been successfully harvested from it for the CIC OAI metadata harvesting project. More details will be provided when the CIC has a front end allowing our collections to be searched.
The code which generates frontmatter and inventory for finding aids has been updated to conform to the EAD 2002 standard.
A preliminary version of the Century of Progress digital collection was launched in time for the release of the December issue of the alumni magizine which included a short piece on the Century of Progress pamphlet collection. The article included a link to the Web site which contains a sample of 19 digitized pamphlets. The digital pamphles are made available as DjVu files which supports zooming in and out of the images as well as easy printing or saving of the entire digital pamphlet in one step.
Images and metadata for The First American West collection have finished being uploaded to RLG. When RLG processes it, it will be the second University of Chicago Library collection represented in RLG Cultural Materials. The first, American Environmental Photographs, was made public on 1 October. Here is RLG's announcement to us of that fact.
Dear Charles and Daniel,
first contribution to RLG Cultural Materials loadedWe appreciate all your efforts to make this happen.
We look forward to adding "The First American West"
in the near future.
With warmest regards,
Karen
Karen Smith-Yoshimura
RLG
This week we launched a new searchable/browsable version of the staff directory and an accompanying static web page listing departmental contact information (general numbers, fax, email, etc.) and linking to maps and library addresses. The searchable database is based on data pulled from regularly-maintained Personnel databases. In addition to the searchable database, an automatically-generated PDF file with a simplified alphabetic listing of all staff is created each time the database is re-indexed. The database relies on the Nand database program which developed in-house.
The Society for the Study of Early China (SSEC) web site was launched today. This web site is a collaboration between the Library, faculty on campus, and the SSEC in which the DLDC designed and maintains the site's templates as well as hosting the site on its servers and the faculty and SSEC produce and maintain the site's content. The SSEC intends the web site to be a locus for scholarly exchange and communication in the field. The site not only includes information about the SSEC but also provides abstracts of each issue of the SSEC Journals as well as publishing research papers, notes, databases, and bibliographies on early China.
Today we launched a new version of the Electronic Journals A-Z List. The list is based on an export from the SFX database which is then indexed by our mulitpurpose Nand search/browse tool.
- Search by Title or ISBN in addition to browsing alphbeticallyPublished 2003 DLDC Annual Report.
MIgrated the Library mail server and installed spam filtering software which will flag all incoming messages to library mail aliases with a likelihood of being spam. Staff will be able to use this flag to filter such messages. More information is available in the Spam FAQ.
Replaced The Art Reserve Kiosk on /e/ with a newly designed Art and Architecture subject page. This new page points to the now productionized Masters' Papers in Art History & Visual Arts database which was implemented using Nand technology.
Set up production version of the DLDC wiki, including creating topical wikis for Nand and Unix Tips.
Translated all the material from our earlier test wiki (based on Wikit software) into the syntax of the wiki software we chose for productioninzation (Swiki).
Created Wiki for PREMIS group at Charles's request. (He is a member of this OCLC/RLG sponsored working group).
Digital Library Federation Framework for a Distributed Open Digital Library.
Posted my notes from the DLF Spring Forum on the DLDC's staffweb pages. They are available at
http://www.lib.uchicago.edu/staffweb/depts/dldc/notes/dlf/dlf-03spring-day1.html
http://www.lib.uchicago.edu/staffweb/depts/dldc/notes/dlf/dlf-03spring-day2.html
http://www.lib.uchicago.edu/staffweb/depts/dldc/notes/dlf/dlf-03spring-day3.html
These notes are posted as taken so you may need to ask me for further comments on items of interest. For descriptions of the topics of each session see the DLF Spring Forum Schedule.
Minutes of the Meeting of the DLF Steering Committee
Productionized the catalog: http://www.lib.uchicago.edu/e/su/news/usnewsp/.
The link to the searchable NAND database is at the bottom of the page.
Unveiled the catalog: http://www.lib.uchicago.edu/e/spcl/findaid/rosenthal/.
The link to the searchable NAND database is at the bottom of the page.
Productionized the catalog: http://www.lib.uchicago.edu/e/spcl/findaid/stc/.
The link to the searchable NAND database is at the bottom of the page.
Productionized the checklist: http://www.lib.uchicago.edu/e/spcl/findaid/century/.
The link to the searchable NAND database is at the bottom of the page.
/data/web/storage/pres and /data/web/storage/spcl together have 336 GB
(or 1/3 TB) of space available. 33 MB (1% of available storage) is
currently in use.
/data/web/storage/arch has 414 GB (or somewhat shy of 1/2 TB)
available. 280 GB (41% of available storage) is currently in use for
the "TIFF farm," and the Annex and StaffInfo archives.
NSDL: National STEM Digital Library. STEM: Science, Technology,
Engineering, and Mathematics.
Lee Zia presented a progress on NSF's NSDL at the CNI Spring 2003 Task
Force Meeting. "To date three sets of grants have been made in three
tracks: 1) Collections, 2) Services, and 3) Targeted Research."
http://www.cni.org/tfms/2003a.spring/abstracts/PB-NSDL-Zia.html
I participated in one of 15 panels reviewing the current round of
grant proposals. We considered 10 proposals, 9 in the collections
track and 1 in the services track.
Participating was a good experience for understanding more about the
NSDL and how we might go about crafting a proposal should that become
something we (the University) want to do. I learned something from the
other members of the panel, and they learned something from me, a
librarian who does digital library development.
Official summary: http://www.diglib.org/forums/spring2003/DLFForumMay03rev.htm
My notes follow.
Developers Forum.
the experience of those who have built them is that portals don't
attract users. (the rest of my notes from the developers forum are on
paper, because i found i couldn't look at others, talk to others, and
input into my handheld at the same time. i need me one of them
notebooks that can read real handwriting. note to keith: using
graffiti is still too slow imo.)
Opening plenary.
From the official summary:
Professor James Boyle of the Duke University Law School, speaking on
"Public and Private Initiatives in Copyright Reform," declared that
copyright laws restrict access to almost all of twentieth century
culture.
My notes on this and the rest of the DLF Forum follow.
lifetime plus 70, or 95 if work for hire
the farther you go back the less value of the copyright protection
it's economically irrational
1. 98 percent of works have at most five years of commercial life and
most not that. ten years out moves to 99 percent
napster kazar morpheus
copyright was there to promote the spread of knowledge
the system worked to inhibit unfair practices among industry rivals
2. all work is copyrighted the moment it is fixed in material form
james boyle, creative commons
the losses from failed sharing
creative commons a second-best solution
www creative commons org
has a checklist, attribution, derivative works, commercial
non-commercial, etc.
creates a lawyer-readable license
then has a commons deed, which has a human-readable version
also has a machine-readable version
''i want pictures of the empire state building which can be used w/
attribution''
launched these licenses in december
mit w/ common courseware will be using it
archival material or material which we ourselves generate
3. losses from misunderstanding of fair use
has openneses and vaguenesses
need to exercise fair use because fair use can be exercised only when
someone is not losing profit by the activity
www chillingeffect(s) org
aaup
contingent evaluation (made-up figures)
publishers provide a credentialling mechanism for which they do not
pay
sunsite --) ibiblio
to what extent do licenses bind third parties? unclear: in some cases
it is and in some cases it isn't. contract law is increasingly moving
to say it is
stress academic freedom and free speech
=====
breakout session 2
"The Bibliographic Enrichment Advisory Team." David Williamson,
Library of Congress
links from marc 856 to onix toc
use prime ocr 95% accurate
200 hits / hr on dtocs
these files on the web are being indexed by goodle and yahoo
google does not index m'data tags because of the porn sites
there's no way to format that text as a marc field
exact same record as they're sending to amazon
---
OCLC Metadata Switch." Thom Hickey, Chief Scientist; Jean Godby,
Research Scientist; Diane Vizine-Goetz, Research Scientist, OCLC
REST, SOAP, etc.
web services (technical term) for the digital library:
register
search
resolve
navigate
decompose (ddc numbers, the name in a list, etc.)
enhance
transform
in the genre of science fiction what is the most common location?
(turns out it's mars)
... and combines genre and subject information
GEM
(godby has a paper on record transformer; ask her for it)
---
eprints in cornell's archive are not peer-reviewed because so many of
them are published in peer-reviewed journals
disciplinary repository
the work we do in the univ of bc is archived by the library
circle of gifts
doesn't need to be archived everywhere
---
"RedLightGreen." Merrilee Proffitt, RLG
john udell's library lookup site uses isbns
citation building format e.g. mla
prefers the dtd approach because schema uses attributes which they
don't index
[see my CNI 28-29 April 2003 notes for the rest of the presentation,
since it was presented there as well]
=====
PLENARY 2
"The DLF Today and the Case for the Distributed Online Digital Library
(DODL)." David Seaman, Director, Digital Library Federation
[David Seaman began by talking about some of what the DLF does/has
done.]
digital formats registry
registry of digital masters w/ oclc
jewell e-resources mgmt--xml format for e-licenses
cataloging of visual resources guidelines in draft now
tei for libraries guidelines: version 2
production workflow good practices
.. workflow designs
.. filenaming choices
.. lessons learned
.. mgmt software used or developed
survey of digital production tools
db- vs text-based xml delivery tools [need to give guidance on these
for the community]
initiatives can just take off
the DLDC can fund meeting and travel conference calls publications
not a democracy, couldn't work
that's what dues fund
[He continued by talking about the DODL.]
gives the library directors something to do
the immediate challenges are emotional conceptual and organizational
the focus is on service to support scholarship and teaching
the hope is that this will drive new content building
capital fund, money put aside for large strategic project
executive summaries of initiatives such as mets
there's no reason why a large part of this should not be systematic
study
=====
breakout session 4
"FEDORA Digital Repository Implementation at UVa--How, Why, and What
We're Doing with It." Leslie Johnston, Director, Digital Services
Integration, University of Virginia Library.
[this talk inspired me to look closely at Fedora. it seems to address
needs that we've already identified as being things we need to start
working on now, and uses a selection model that we've already
identified as being the way we prefer to work here.]
fedora
the first release is this friday, 1.0 w/ documentation
mozilla public license
www fedora info
the fedora architecture is based on object models
objects can be simple or complex
metadata inline or not
behaviors
have code objects as well
disseminators are containers
(buckets)
fedora uses mets objects while retaining files in their original
formats
communication and public relations dept [at virginia did the] graphic
design
what formats need to be delivered on the fly
get static view get dynamic view
tei ead dc vra core
uva desemeta
(merrilee's rlg format, event based)
gdms is how they represent collections of objects
gdms is a tool to create fedora objects
uses tamino (for EADs) and opentext
perl xslt mrsi
automatic tiff to gif
second step will be atomistic control at the file level
the same object can have multiple parents but new notion is to have a
primary parent. also part of phase 2
repository that has been selected for collection
subject librarians and user services librarians, usability group in
communications and publications
looking at ipedo as an alternative to opentext
---
The University of California's Collection Management Initiatives:
Findings on Use of and Preference for Digital Journals." Gary
S. Lawrence, Director, Library Planning and Policy Development,
California Digital Library.
new issues popular in print format [note to self: but perhaps can
reduce binding budget]
eighty percent of respondents said that backfiles dont go back far
enough
results entirely consistent w/ outsell's finding that office and home
use dominates
ten or thirty three to one in favor of digital
shared print archive elsevier and acm
the reasons for preference for current issues in paper will be looked
at -- habit? age?
relied entirely on vendor-supplied data
---
"Cushman Exposed: Exploiting Controlled Vocabularies to Enhance
Browsing and Searching of an Online Photograph Collection." Michelle
Dalmau, Interface and Usability Specialist, and Jenn Riley, Digital
Media Specialist, Indiana University Digital Library Program.
expose structure to facilitate browsing
date genre subject location and combination of categories
integration of thesaurus search w/ database searching
some people want to browse from broader to narrower and others the
reverse
only one third of thesaurus terms used, an implementation problem
dynamic search and browse
late summer early fall launch of cushman collection
lead-in vovabulary
thesaurus mgmt issues
the ui may mask the controlled vocabulary structure
not interrupt normal browsing behavior
-----
breakout session 6
A Registry for Digital Format Representation Information." Stephen
L. Abrams, Digital Library Program Manager, Harvard University
Library; MacKenzie Smith, Associate Director for Technology, MIT
Libraries.
global digital registry format
identification
validation
transformation
characterization
risk assessment
delivery
ingest sip validation sip to aip transformation [pronounces "aip" as
"ape"]
access
preservation planning, sip to aip transformation
mime types
insufficient level of detail
granularity--too coarse, eg tiffs compressed in different ways
data and governance models
pronom--public records office uk
diffuse in europe, it's a website
ietf media features is a follow-on to the mime mechanism content
nedotiation betwen client and server
owner maintainer identifier name alias taxo[nom?]y / ontology typing
subtyping eg svg a subtype of xml
registry service,register interest for updates obsolescence new tools
etc
executive summary for money
format typing mechanism at the approprtate level of granularity
hul harvard edu formatregistry
---
"Strategies for Implementing Preservation Metadata in Digital
Archiving Systems." Rebecca Guenther, Library of Congress
PREMIS
a strategy for finding strategies for preservation metadata
dealing w' aip
content vs preservetion description information
60 elements
what's a minimal core
automatic generation
apply by object type or object behavior
more practical view, best practices document
core daa dictionary
format for recording (xml schema) and pilot programs
lack of common vocabulary
significant attributes
www oclc org/research/pmwg
[See also notes from last CNI, Priscilla Caplan's presentation]
---
"PDF/A: A New Digital Preservation Format." Bill LeFurgy, Digital
Initiatives Project Manager, Library of Congress
the goal is to have pdf/a accepted as an iso standard
up to two years to approve the standard
needs of document producers
.. ease of creation, fits neatly into a workflow, should be flexible
needs of the users
.. easy to search, ease of discovery, getting an exact appearance
of the original document
contention
requiremets for archical repositories or others that might be
maintaing documents over time
.. no proprietary formats
should work well tomorrow
homegeneity
support m'data as far as you can for discovey provenance preservation
activities
how to make that as easy and painless and possible
want to support required significant properties
embedded fonts no encryption standard color space limits on compression
xmp extensible m'data platform to associate m'data w/ object
xmp introduced w/ acrobat 1.4
dc medium mgmt rights mgmt
has broad extensibility
can embed m'data w/in the binary file but shows up as plain text
still looking to see where xmp is used in real world
cant do data typing or validation
cant automaticalyy compare schema w/mdata for validation
thinks some kind of partial validation would be possible
stephen abrams is involved
the default output of acrobat may be pdf/a according to an adobe
engineer
pdf as a page description language vs xml which divorces presentation
and content
[in response to why pdf and not xml from the floor:]
the courts were not comfortable that xml could handle all the
formatting needed
also xml is less easy to implement
need absolute assurance that formatting will be one hundred percent
the same
[see also notes from last CNI]
=====
[I had lunch with Leslie Johnston after the Forum about Fedora. here
are some notes.]
pid system? -- not handles, but can use an external handle server to
register the pids.
how best toiget going w/ fedora?
talk to northwestern they and tufts farther along.
go to the fedora info site will have all the contavcts for deployment
partners
cornell is participating in the coding effort and will be for the next
two years
next year access cootrol
havent implemented controls for primary and secondary parents, the
kinship metadata [cool concept which came out in the presentation:
documents can have multiple parents, so can be linked to from multiple
places, but want to be able to indicate a primary parent, to indicate
the original context of the document]
system in java but supports disseminators in perl
solaris
wayland, russ local lead implementor to talk about technical details
tufts oki apis for learning mgmt system
can use whatever metadata thing we want
ead tei gdms and vra core not yet [i believe the "not yet" applies
only to vra core]
the vra core does not have good encoding guidelines and will change
soon: [will] add guidelines and constraints
1.25 fte code support and disseminator--java programmer
other does ingest files conform to standards: librarian w/ system
librarian user support background
fedora info to get the software
=====
METAe
the metadata engines product
automated conversion of printed documents into fully tagged xml
claus gravenhorst
stu schneider
content conversion specialists
docWORKS/METAe
product launch march 2003
aug 2003, ifla berlin
software for use in libraries and service bureaus
biblioteca statale a baldini italy
docWORKS engine
image preprocessing
layout analysis
character recognition
structural analysis
output: tiff mets alto jpeg
alto: physical representation of each page
has mdata for the various elements of the book
can link from logical to physical structure
univ of innsbruck is hosting the alto standard
claus.gravenhorst at ccs-gmbh.de
---
[after the METS portion of the meeting:]
can put a wrapper around it saying that it is not xml for descriptive
metadata
can output to other dtds
ccs ruleset
the engineers adapt the ruleset to your needs
in 2004 want client side rules adaptation
abby ocr
fraktur yes but gothic fonts will not work
one operator can handle 5000 pages a day
get samples at beginning of a project
quicktime to mets structmap for video files
the lc mets sheet music page turner guy guy seems related to fedora
nyu mets creation tool
useful for small scale production
mysql plos peml apache front end
sourceforge
300 pp 45 minutes on fast pc
but can have client server scenario
greyscale
integrates spell checking
wants to partner
tiff images 300 dpi ftn greyscale
-----
METS update
the behavior sec is now recursive
june/july fedora 1.1 will support this
technical m'data for text--didnt catch name
mets:: ims, scorm
just need to make sure that they interoperate
profiles: extension lang and controlled vocabulary
more exx inc tei
mets: standards process
fast fast track for registering
mets opening day
who when what
aiming at dc oct 27th-28th
half day intensive introduction to mets
half day for people already implementing it
half day hard core for developers where do you start
the practicalities
nancy's rights schema
not intended to be actionable so content guard wont sue us all
object types vs profiles
it's a mgmt distinction
object types can be used to class profiles
will be a public registry for prfiles
will have a profiles xml schema
may have generic profile one for page-turned objects
nyu website: xslt for html page-turned objects
sourceforge site xslt suite for mets objects
rick moa2 disseminator, converting that to work w/ mets, page turning
part is done, the descriptive part is using mods, not yet finished
rlg will make their viewer publicly available
lc sheet music page turner
in conjunction w/ tei encoding
(eric? lc)
fedora does tei page turning, chris wrote it
can link it appears from mets to mpeg7 because the hooks are there
consult back and forth on what works and what doesnt nyu and indiana
-----
"Robotic retrieval and scanning." Sayeed Choudhury, Johns Hopkins
University Library.
a data capture framework and testbed for cultural materials
hopkins
built a remote shelving facility but wanted to be able to browse it
telerobotics
''computer system surgery''
has conducted surgery in singapore
constrained the problem
CAPM supplementing the existing system physical delivery
can use different arms
hyper redundant systems
eg robotic snakes
unanticipated uses
simulated the user experience not just ask them a question
separate cost and benefits team
two to thirty dollars per use cost
sixtyfive dollars benefit from semester
levy collection of sheet music is at hopkins
institute of museum and library services imls
adaptive optical musical recognition software
if makes the same mistake repeatedly
can teach it
guido is the representation
output is midi, mp3
is this a happy or sad song
was the melody reused in another piece
tore it down and built it up more modularly and called it gamera. its
a framework which includes classifiers etc
medical [f]rench ancient greek early modern english
uses python will pop up code in a separate window for manipulation
perseus tufts computational humanists
nsf information technology research project
tactic robotics: robots that feel
92 percent recognition rate
parc hidden markov models to go from local to global recognition
dke jhu edu/
CAPM
gamera
---
"The Preservation Productivity Paradox in the Modern Digital Library."
Catherine Aster, Stanford University Libraries, and Stuart K. Snydman,
Digital Library Projects Manager, Stanford University Libraries.
two million pages in 24 weeks
swiss startup, fordigital books
scanning at 300 dpi greyscale
1160 pp/hr theoretical max
650 pp/hr --real throughput
have metadata entry capture templates for different kinds of
marterials eg books journal articles etc
uses barcode (i think) to create filenaming
workflow includes derivative making and ocr'ing
abby and prime ocr, that does 99 accuracy which is enough to create a
searchable index if image will be presented to user
descr and technical mdata is uploaded into a relational db
strips tiff headers
processing ten to twelve books per day
5mb each
lzw compressing all master tiff images [i told him afterwards about
bzip2]
dpi is determined on a project by project basis
1800 pp / hr if no scanning
www-sul stanford edu white paper
kitoss [sp?] is a competing system
theyre scanning two-up and page splitting
can reorder pages and record it in the metadata
ims is content packaging standard and lom is metadata
xslt stylesheet brian tingle of cdl
amazon has a web services front end
yee has a software tool, javascript i think, which allows users to
recombine objects from different data feeds
simplifying assumption
ideally wants the universal canvas where can drop all kinds of things
in
david greenbaum and david yee
---
"Schema-Driven XML Editor for Metadata Capture." Stephen L. Abrams,
Digital Library Program Manager, Harvard University Library
xml used as encoding form to record the metadata
concentrates on descriptive metadata
generic tool
hide the xml from the cataloger
schema directed editor
will adapt its behavior to conform to a given schema
Xorro: a schema driven editor
were assuming data centric documents, ie fielded input rather than
full text
first youre asked to select top level elements
serializes to valid instance document
swing client or building swing based guis
SOM schema object model jdom and xerces
name groups and global attributes
namespace qualified
bil: biomedical imade library
ted: templated database, built around tamino
used for anthropological reports
aes audio engineering society
1.3:1 java apis
will be made available under gnu license, open source
can be deployed client side or server side
needs sax parser they use xerces
need to add documentation
off the shelf xml tools dont hide the xml
stephen_abrams at harvard edu
watch diglib [for announcement when tool is ready]
or email stephen to register interest
---
"The OAI Static Repository: a file-based approach to exposing metadata
via the OAI-PMH." Herbert Van de Sompel, Los Alamos National
Laboratory, Research Library
[will make the slides available on his website]
opening plenary:
J. C. Herz, principal of Joystick Nation Inc.:
"Everquest for Knowledge Workers: What Organizations Can Learn from
Online Games and Other Social Software"
This was an extraordinarily good presentation (some others thought
otherwise). The speaker was able to use the approach she had presented
consistently when it came to time answer questions from the audience.
For references to her published work see http://www.cni.org/tfms/2003a.spring/plenary.html.
everquest 450,000 people, pay sony 12.95 a month to do this
questions information as a fungible
engineers like well-defined problems
information is implicit, is in people
it's in a constellation of knowledge that's unique to that individual
writers and teachers have a gift for making information explicit
we need to find ways to make implicit information explicit,
cannot abstract it out so easily because exists in a social context,
''the mesh''. engineers like to talk about it in terms of bandwidth
[and] servers
believes the value is the context
the systems that create that context is where the real value
lies. mentioned libraries and curators
google search link ranking but links are put there by people, so are
really harnessing the collective efforts of many people
metadata: tell us what you have we'll tell you what you've got
social software: software that supports group interaction, instant
messaging, weblogs, wikis
groups are different than lots of individuals
email is not social software, because targets individuals not groups,
the cc line changes it
(craig) sharkey one of her colleagues
groups of neurotics would thwart efforts to heal them finally because
they didn't want to threaten the existence of the group
weblogs have a temporal context, unlike the average web page
unlike a home page has a link to source, so the group faces out
together rather than facing inward
create a kind of edge awareness for the group
but also have monitoring of other groups' information flows as well
so know who the players are
wikis: can track back and remove.
persistent stores of knowledge which accretes. that thing stays there
as an artifact of the process
a bunch of people sitting around chatting becomes very
performative. flaming, informative, etc. native to the structure of
communication. not the domain.
average weekly usage of everquest is 20 hrs per week. doesn't stay
still. people want to go back.
lots of information referencing each other. lots of links create a
mesh. that's where the value is. can start to chart the stuff and
really start to measure the (...) of information
not applicable to all fields, but to fields where time is an issue.
one positive attribute is either cheap or free
wikipedia
post then process. can correct the information later
''my readers know more than i do'' and that's a good thing
it's all about leverage
the speed of exchange
vs the static world
dynamic attractors -- people who can attract others to their way of
seeing the context, show how you can put things in context
512 inbound links creates authority
the archive: scientific archive
we'd rather have it fast and can review it sooner. 512 inbound links
is peer review
making more strides in the left-brained fields
w/ humanists harder, because less measurable and more inward looking
the strongest network, someone will soon get tenure for a blog, can
measure the strength of the blog
knowing vs editing
books freeze the process at some point
the dead tree has a role
but the stream has a value, the electric network of human interchange
when you share knowledge you're not losing power; you're gaining
leverage, because the person who puts the flag up becomes the leader
control vs illusion of control
have to be sophisticated enough to use it. putting the information out
there and then become a hub
the group is its own worst enemy
when you set up these persistent environments
even though they're customers, the structure of the medium creates a
feeling of citizen-like entitlement for the participants
18 pages into the book online want the tree
thinks there is a complementarity btw copyright and fair use
stealing attention from the journalist's original article and the
context in the blog, but if it encourages people to go to the
original, they're out of their minds [if they don't allow it] :-)
groups of editors are always contentious
the core group on a wiki should cohere or fork, because the internal
core group serves as the fire brigade
these modalities encourage political participation
trent lott went down because of blogs
news cycle goes into the past
but a blog allows things to accrete
kept it alive until went back up into the mainstream media
also on the right: christian conservatives are blogging
it does build political traction
the field tilts towards transparency
kellogs crackle k
top link on google was about the recall
says the links are explicit
there's something which persists
the things we value are beyond technology, e.g., the authority of the
individual
its the authority we care about, not the mechanisms of establishing
it.
mentioned institutional costs (in current ways of establishing
authority)
most of the value you just described comes in from the edges, starts
pushing the center
you can't stop because of the slowness of the process
communism :: metadata
explicit vs implicit metadata
if i link to your blog or mentor your paper
keep the ecosystem healthy
can't ultimately preserve the mesh in amber, and doesn't think we
should, the preservation issue
dead hill--(i think) my readers know more than i do
teaches a graduate class (where?)
let the students build a blog or wiki for the class
the group is important. collusion is a bad world because it
violates the industrial process
can expect or allow them to do it collectively, like shipbuilding
=====
RedLightGreen -- RLG
Merrilee Proffit presenting (very nice job which included quicktime
videos of students being surveyed)
effort to put the rlg catalog on the web as a free resource
rlg catalog 45 million titles 100 some odd million records 23 years
from oai dublin core catalog to a web resource
needed a target audience
informed enthusiasts but not (professional) researchers
rlg's advisors said oai would not be popular for this type of resource
re-envisions the library catalog as an intuitive web-based research
tool tuned to undergraduates' search behaviors which meet their
information needs
google-like simple search box, also an advanced search, but not the
first thing you see
mercury example (the planet, the car, the god, etc.)
undergraduates are busy and place a premium on their time
maps [for many meant] maps to library
scores sports
students are savvy and discriminating about what they would find on
the web
must build a bridge to their universe
frbr comes in because don't understand notion of edition in initial
stages of their research
work and expression are collapsed in title cluster; includes related
works and adaptations
manifestation is an edition
item is item
version 16 of modifying lc marc to xml dtd for this project for
loading it into db2. removed 2000 elements
juniors or seniors in soc sci or hum and had to have written two or
more research papers 15 pages or more in length
user testing conducted in april and may 2002
undergraduates really interested in journal articles
browse option would not be used because uncertain whose point of
view it represented
ILL not popular because unknown or too slow
my bibliography very popular
standard citation formats [wanted for sources in the catalog]: "it
takes me twice as long to format the bibliography as it does to write
the paper"
[students in this study were drawn from] stanford and santa clara at
least [i.e., i didn't catch the rest]
the students were paid for their time
the study was done at rlg
expecting it to go live by july or august
create an account, home library, choose a bibliographic style etc.
use number of copies to test relevance
the most recent english language edition displays first
john udell's library lookup site
tries to catch spelling errors
pilot phase aug-dec 2003
columbia nyu swarthmore [are in the pilot phase]
www rlg org / redlightgreen/
mgp at notes rlg org
[in response to questions from the floor:]
wireframes done in s'thing like photoshop. fooled people into thinking
it was a product [the intention was not, however, to fool; it just
happened]
title cluster does stay w/in same author [so not a superwork in
svenonius' sense]
the local catalog is optimized for known-item searching
doesn't work w/ sfx because would obscure the user behaviour from them
might seek corporate sponsorship or small institutional contribution
CLiMB
wants to develop a set of tools that can be opened up to the rest of
the community
=====
PDF/A
pdf archival
aiim and others to define characteristics of an archival pdf file
libraries strongly represented in this group
international standard
[PDF/A is needed by the courts system. a representative of this
system, stephen levenson, presented first.]
engaged in it for seven years
need accurate rendering of documents for the work that they do
e.g. page references
but word processors change pagination if you change printers etc.
docketing :: indexing
complaint response summons etc. [what the legal system needs to track]
pdf is not a proprietary adobe version, has a spec.
corel's version 9 pdf could not be read
version 10 is still large
keep things a minimum of 20 years
then to nara, which keeps it till the end of the republic
use filters to make sure they can read them, stop executables
hope with pdf/a to put those tools on the desk of the creator
decided to go with an iso standard
decided could not standardize on a particular word processor's format
keep records to protect citizens' rights. don't want to lease that to
microsoft because of the need to "reup"
electronic paper emulation at first
color transmits value--"the part marked in red is what i disagree
with"
adobe will respect the core, the pdf/a core
gov't and pharmaceutical are co-chairs
dentists' records, retention
pdf/a defining future technical standards
pdf/a take font definition and stick it in the file because not all
fonts persist
australian victorian ...
xml mdata and pdf for their files
rolled their own
pdf/x iso 15930 is their model
an exchange model read by printers
can render color accurately
time life only accepts pdf/x
want this supported by reading devices of the future
the mgmt of electronic records is the mgmt of m'data
not much there today in pdf but will expand it
media is not included
(indecs mpeg21)
pdf/a based on 1.4 pdf
envisions save as pdf/a
www aiim osg/standards
stephen levenson
-----
william g lefurgy library of congress
aiim and npes
fonts make documents not self-contained
pdf 1.4 offers an xml-based type of metadata capability
completely published description of format [but]
adobe drives the specification [and]
no assurance that they will continue to make the standard public
flexible so can contain executables e.g., javascript
so want to make it an iso-owned standard
stable subset of pdf
embed the fonts, standard color spaces
exploring use of xmp for metadata
descriptive schema based on a subset of dc and is extensible, based on
embed an xml packet so visible as plain text, and parsable as such
based on rdf/w3c recommendation
xmp sdk is open source
some vendors already use it
the utility of rdf is up in the air
cannot do typing w/ xmp
a collaboration of four technical committees in iso
will go in front of them in september
niso will be [...] way to get comments to the group
would take about two years if no problems
but pat harris foresees bumps w/ four groups involved
steve abrams of harvard will also be involved
the library world will have good representation
=====
Enhancing Interoperability
between Digital Libraries and Educational Technology
via XML Crosswalks
Raymond Yee
Technology Architect, Interactive University Project
University of California, Berkeley
xml crosswalks
scorm
mets to rss because used by weblogs
libraries, educational technology, weblogging
IMS content packaging and IMS metadata
RSS0.9x and 1.0
moa2 to mets
rss aggregator, one newsfeed is his weblog
generates an rss channel from his weblog
ims, scorm educational technology
openoffice.org, its file formats are native xml formats
rss is not a flat structure
mets to rss is lossy
opml outline processing markup language preserves the hierarchical
structure
web opml for directories and outlines
mets
header
descr
admin
filesec
structmap
..div
..fptr
ims
manifest
metadata
organizations (structure)
..organization
resources (files)
..resource identifier
rss (doesn't allow for recursion)
channel
title
..descr tags for channel
item
..title
..link
..description
opml
head
body
.. outline
simile
semantic interoperability of metadata and information in unlike
environments
UN translator approach to the many to many translation problem
russian to chinese
mackenzie encourages thinking about reusability of schemas
rss nothing profound but has incredible traction, can ignore it or
just get in there
mackenzie: might think whether can put rdf somewhere there in the
middle
w/ audiovisual materials, might have harder time mapping to ims w/
some tags
weblog information
whose weblog
what it links to
date
who wrote it
berkeley moa2 comes out of a database so all regular
journey to topaz
a jar of dreams
uchida became a children's author
[he writes in python]
in open office has not found a way except to embed images
raises intellectual property issues
mets document or scorm object
will do ims to mets, that's a use case that's important
demo'd the scholar's box
crosswalks on his website and on the weblog
refs will be put up on cni site
=====
marc in xml preservation metadata
there is a marc21 to sgml dtd
it's still up on the website
marc xml takes a very slim approach
2709 marc 21 marc classic
marcxml is round-trippable w/ marc 21 w/o loss
can edit the xml
can easily craft interesting displays
marc xml bus
have marc8 character set mappings into unicode
mix: metadata for image exchange
mods: not completely round-trippable
rich m'data format but simpler than marc
semantics richer than marc
particularly had electronic resources in mind, complex digital objects
that had relationships to express
people wanted something richer than dc but not so rich as marc and
wanted language-based tags
mods 2.0 now available
want to be able to express citation information better
revisions in the next month or so to accommodate that and a few
corrections, enhancements, compatible w/ earlier mods records
mods instead of moa2 gdbm?
minerva, lc's web archiving project
webarchivist.org suny institute of technology
lc uses mods for most mets packages that accompany electronic
resources
derived from marc record in catalog in some sense
cni lifecycle approach to the project
PREMIS: preservation metadata implementation strategies
pres m'data a subcategory of adm m'data
it does specify rights mgmt info
oais: iso 14721:2002
wanted to archive space data
as we road test oais find it might not be perfectly suited to our work
[priscilla is a good model for how to present potentially complex
information understandably; her intellectual approach to this area
might lead to something significant if it's not subordinated to other
concerns]
preservation mdata assoc w/ aip
fixity information, like a checksum
[present these slides to the mdata group]
oclc and rlg thought it was time to move after a year of review
the charge of premis
there is a great variation in what archives ingest
different volumes of ingestion
need to support migration emulation etc.
interesting to see how one standard [might] do [it] all
managing these at the file level not the logical level
significant attributes, notion slide, coming out of the arts
commmunity when dealing w/ digital. arty but also coming out of other
communities
e.g. timing could be a significant attribute, sts to find out have to
ask the author
[PREMIS] has been scoped to cultural heritage institutions
practical problems ''the most interesting slide''
Here are some highlights from last Autumn's CNI meeting.
opening plenary cliff lynch:
digital rights management: nonsensical and cynical at the same time,
because fundamentally about restriction, not enabling
look at creative commons: unlike digital rights management technology,
which tries to restrict use of digital works, creative commons is
providing ways to encourage permitted sharing and reuse of works
(creative commons dot org)
institutional repositories and learning management systems are related
digitized course catalog loses historical record: a strategic problem
through lack of planning. problems like this repeat.
at the december portland fall 2003 meeting, want one faculty member to
describe work at breakouts from a faculty-centric perspective
we need to think about metadata in another way, created curated used
used indecs-like definition of m'data
used ''assertion'' [about metadata]
(compare <indecs>: "The principle of Designated Authority.
The author of an item of metadata should be securely identified." (2.3)
That is, not only resources, but metadata about those resources, need
to identify their creators. See also
The A-Core: Metadata about Content Metadata and
AC - Administrative Components, from the DCMI
Administrative Metadata Working Group)
Z30.50 next generation (zing) technology refreshment
shouldn't throw out analysis that has gone on
benchmark database for images similar to trek for text
amico has a new home at the university of toronto
authorization authentication and access: shibboleth
institutional repositories are important
documenting rights: there will be a report up on that shortly
(''creative commons'')
[need a] privileged place in cyberspace for teaching
computational m'data to extract m'data from works
the ''wired'' classroom is now potentially wireless
our concerns as creators of content are very different from commercial
concerns
DRM: thinks it's a misnomer because intended to allow content
producers
to have continual control over their content [while allowing for an]
''acceptable user experience''
appropriate access tied to people not devices
the industry wants to control, while research and education wants to
establish ownership
indecs basis for mpeg 21
rights expression languages
can't program fair use
e-books and distance education might be good arenas for drm; teach act
may need narrow drm channel
odrl and xrml have no data model and are insufficient; indecs has a
data model
oasis working w/ xrml
in re: special collections kinds of rights management issues: create
ad hoc metadata scheme knowing that will try to crosswalk in a few
years. minimize the variety you have to document. look at creative
commons. spcl kinds of restrictions can't be automated easily. avoid
gratuitous complexity.
EAD
nancy fleck and michael seadle michigan state
nancy is head of technical services; michael heads the digital and
multimedia center (among other things)
national gallery of the spoken word
useful metaphor in presenting digital sound, nat'l portrait gallery
ead collection vs individual item access
description contains interpretation, for example, who we think is
speaking: mckinley might have written a speech but not delivered it
vs treating them as monographs, would flood the catalog artificially
a sound recording is more like a journal article in size
collected speeches of would be the major organizing principle
maintain the gallery metaphor
can have links to images sound files and description of who the
speaker was
use of encoding analog attribute which can go into any tag and
say which marc field it goes into (for ead --> marc crosswalking)
[from here on out my notes on this and subsequent presentations
consist of more technical detail than might be useful here]
institutional repositories: "more useful as a way to talk about things
than as a technology"
This forum has been summarized on the DLF site:
http://www.diglib.org/forums/fall2002/dlf-fall02summary.htm
One of the most interesting presentations was by Chris Turner of the
LEADERS Project, University College London. Here is a description from
the XML4LIB archives, which is fuller than my notes.
Soon after this presentation, the third LEADERS progress report was
issued; it contains the following, which summarizes what I heard at
the TEI presentation, and which particularly caught my attention at
that time.
"... we have also identified the need to build models and rules that
can deal with structures and features such as overlaid data, textual
and numerical data presented in complex tables and the presence of
formulae and mathematical expressions within the text. Data within
archive documents can be described as overlaid when an underlying
layer of data is used as the basic structure onto which further data
(other layer(s)) is applied. The underlying layer of data is usually
printed, and the overlaying layers are usually handwritten on top pf
the printed structure. Such structures and features are often found in
archival material (particularly administrative archives) but the TEIs
current encoding scheme will need to be developed if they are to be
comprehensively dealt with. The team are looking at a variety of
examples of these structures and features across a range of documents
from the UCL Archive. Rules and models for encoding are currently
being formulated and tested by the team." (2.3)