- Clean up and standardize notes in the 856$y$z
- Eliminate proxying from all open-access titles; identify all OA titles to Endeca using Item Cat2
- Add proxy prefix to all paid titles
- Eliminate duplicate records for the same resource, unless necessitated by separate data sources for separate vendors
- Remove all ebook data from print records, creating separate e-resource record
- Provide LC call numbers for all ebooks
- Add "ebook" suffix to all LC call numbers lacking suffix or with "eb"
No 1xx/245 (1176 records)
- Majority appear to be brief records which duplicate IEEExplore, NetLibrary/EbscoHost titles for which we also have a full record; brief record can be deleted
- Follow URL to find title, search title in Sirsi to determine dups
856 missing proxy (8703 records)
- Majority of these are for open access titles (Project Gutenberg, ECU Digital Library, California Digital Library, Google books, National Transportation Board, etc.)
- Proxy can be added to commercial URLs using global edit
Reference in Item Cat2 (8 records)
- The Item Cat2 just needs to be set to blank
Duplicate titles (8668 records)
See Ebook Duplicate Project Guidelines for more instructions.
- 360Marc/OCLC conflicts for same resource- keep ssj, add OCLC# from dup; delete OCLC dup
- 360Marc ssib/ssj for same resource- delete ssib
- 360Marc duplicate records (same ssj#)- keep most recently updated; delete dup NetLibrary/EbscoHost records- delete EbscoHost, as NetLibrary resolves to EbscoHost
- Multiple vendor duplicates- choose better record and add valid link from deleted record to it BUT if the record you retain is a SerSol record, the extra URL would not be protected upon overlay. In this case, contact Christee about possibility of turning on the other provider version in SerSol. ***see addendum at bottom
- Series name duplicates- these have no indication of which volume their URL points to; add volume to 245 (e.g., |nVolume 55)
- Lecture notes in mathematics sub-series- these should get volume numbering in 245 as above
- Multiple volume monographs/annuals- these should get volume/year numbering in 245 as above
- Delete holdings in OCLC from deleted records
- General note on subfields in 856: not necessary to rearrange, but usual order is: |y,|u,|z. Also, ok to add |yView resource online (NCSU only), but not necessary with current Endeca interface. Probably best to create a text macro if you feel the need to add.
MulVer records (14557 records)
See Ebook Mulver Project Guidelines for more instructions.
- Probably needs to be separate project to generate additional e-only records; talk to Monographs
For more instructions on handling duplicates and unmulvering records: See Ebook cleanup cheatsheet
Titles lacking LC call numbers (14734 records)
Titles lacking 856 (309 records)
- Appear to be forgotten bibs, particularly brief bibs, which were replaced by full records but not overlaid or deleted
- Some appear to be brief records for Kindle titles
- Need to be checked against Sirsi for dups on edition, probably deleted if they have no orders attached
- This is probably a separate project as well, perhaps can be somewhat automated if OCLC # present
Call numbers lacking “ebook” extension (15119 records)
- May be amenable to API work
Call numbers with wrong class scheme (1894 records)
- API to flip ClassScheme to LC for provided catkeys
- Few XX should be flipped to AUTO manually
Multiple conflicting 035 (279 records)
- Probably should be re-run after duplicate titles query is finished
Call numbers lacking Cutter (673 records)
- Some of these are ALPHANUM with LC class scheme- these should have class scheme changed
- Remainder just need Cutter and “ebook” extension
Titles needing OPENACCESS in ItemCat2 (221742 records)
- Needs further work to eliminate some categories
Addendum (per CP 2/04/14)
- I have now cleaned up 251,000 URLs so that there is consistency in display and practice, but since last week a couple more have come in with free text 856 public notes. I personally think that it is important to be consistent here, if for no other reason than ease of batch editing should we decide to globally change text in the future. Here's is what I did to try to achieve consistency. Note that the order of subfields and content IS meant to be prescriptive:
- |y Access note- if none is provided, this will default to "View resource online (NCSU only)" in Endeca, but I personally would prefer that we be consistent, so that any future system migrations will not be reliant on what the new system can provide.
- |u Proxy prefix if needed
- |z Part information (spelled out, so that it can be seen: Volume 1, Part 1, etc.)
- |z Provider/package
- For paid titles:
- |yView resource online (NCSU only)|uhttp://proxying.lib.ncsu.edu/index.php?url=http://...|zSpringerLink ACM Digital Library, etc.
- For HathiTrust (per Kristen W.)
- |zHathiTrust Digital Library
- Note that the HathiTrust link should not be proxied. HathiTrust does not support authentication.
- For open access:
- |yOpen access e-resource|uhttp://...|zNational Academies Press [AQD:Internet Archive, etc.]
- There are a few e-book URLs that begin with |3. This subfield should be reserved for URLs that point to auxiliary information, like finding aids, local directions, TOCs, and so on. OCLC should be cursed for putting provider information in |3! You will still find tons of 856s in our catalog that don't stick to this model, simply because it would be too time-consuming to move the providers to the end of the string if they are at the beginning presently. Same for 856 without the "View resource online (NCSU only)". Other than that the notes should all be uniform. No more "|zView online resource", "|zOpen access title available from National Academies Press", "|yView online issue from Springer's LINK", or "|3View text online from RAND".
Let's keep a list of issues that we encounter with these, as some issues will need working through with Monographs. Here are some of the issues encountered thus far:
Duplicates: Liz has created several views of the data to view potential duplicates based on OCLC # and ISBN. The breakdown of these is as follows:
OCLC#: Using OCLC # matches there are only 138 duplicates. All but one represents two titles. One represents three titles. In most cases, these duplicated due to a mismatch on flexkey (title control #), but in some cases there are even duplicate flexkeys!
ISBN#: Matches based on ISBN make up 14459 duplicates for which there are 21203 titles. Some of these are duplicated as many as 24 times, as eBrary appears to be turning some series (see “Advances in medicine and biology”) into “analytics” by repeating the set record once for each volume! Liz has removed a lot of duplicates that resulted from many single records having ISBNs for multiple volumes in a set. A high percentage of duplicates represent Springer titles mulver’d early on and then turned on in Serials Solutions. Sadly, the e-only records often lack call numbers and subject headings, which the print records have. There are some records duplicated due to multiple platforms.
655: We have some of these as 655 -0 (correct), 655 -4 (less correct, but workable), and 655 -7 |2lcsh or lcgft (lcgft is just plain incorrect; lcsh is correct, but means that these file differently than those with -0). We have 451,424 Item type EBOOK but only 183,838 that have 655. Discussions with RIS, IT and CM indicate that all “Electronic books” and “Electronic journals” 655s should be removed, as these should be searched through the format facet rather than the genre.
856: To proxy or not to proxy. I assume that, if these are not Item Cat2=OPENACCESS, they should be proxied, but am I correct? That’s my assumption!--cp
Shadowed: Some records that are shadowed have fully functioning URLs. Are they out of license or did they simply get forgotten at some point. Follow up with CPA on what needs to be done to determine if they can be turned on or should be kept shadowed.
PATRONDISC/PATRONOWN: Are these records replaced by fuller records when they become available or when the purchase is made or are they static after download? There are 22,766 of these currently in the catalog and all have flexkey of simple 8-digit numeric, making them susceptible to overlay by wrong record. I’m pretty sure that these are static on download. I don’t know if YBP has a bib notification service for them. – cp
OPENACCESS: Other than those already marked, are there others which should be? National Academy Press, for example. Yes, for NAP “Open books” collection. I was planning on having Dawn do an API to turn this on for these, since Item Cat2 will not be affected by SerSol overlay of bibs. I turned off proxying for these in the SerSol KB, but there doesn’t seem to be a way of changing the non-proxied titles 856$y to “Open access e-resource” from “NCSU only”. Might have to have a chron report in Sirsi do this. – cp