In September 2014, Access Verification statistics were downloaded and assessed to determine if there was a continued need to perform Access Verification across all electronic collections annually. Through late 2013 and 2014, staff were processing approximately 400 Access Verification titles per month.
Breakdown of the Data
-More than half the titles verified were from the largest 5% of databases
-More than half the databases had 0% error rate, though only 2 of these had more than 100 titles
-One third of all verified titles were verified in the last six months of data
-Error rates have significantly dropped since 2012 (from over 14% to less than 3%)
Distribution of Errors
-Small databases (less than 100 titles) produced 50% more errors, and very small databases (less than 10 titles) produced the highest error rates
-“Coverage Date” errors were the most frequent type of error at 65%, and comprised 82% of errors in the largest databases (more than 500 titles)
-“Access Denied” errors were the second most frequent of the 8 types of errors at 13%
-Small databases generated half as many “Coverage Date” errors per title as large databases, and they generated 2.5 times as many “Access Denied” errors
The Access Verification data exported from E-Matrix included 31 months, from November 2011 to September 2014 (no verification occurred during a 5 month period due to technical problems in E-Matrix). In total, 21,248 titles were verified from 285 unique databases. The distribution of titles across databases is extremely uneven. Only 11.93% of the databases have more than 100 titles, however these databases account for 83.66% of the titles. Databases with 500 or more titles represent less than 5% of all databases but contain more than half of the titles verified. None of these 500-plus title databases has a 0% error rate, and only 2 databases with more than 100 titles have a 0% error rate.
The total error rate for all titles combined was 5.52%, and 55.09% of the databases produced 0% errors, resulting in a 0% median error rate across databases. Databases with less than 100 titles had nearly a 50% higher error rate than those with 100 titles or more, and the error rate for packages with less than 10 titles produced the highest error rate at 8.79%, a 71% increase over those with 100 titles or more. Four databases contained more than 100 titles and had error rates higher than 10%. One of these is not a single-publisher database (“Single Journals”), one is being cancelled (“De Gruyter Online”), and one is the IEEE/IEL Collection, a database largely unmanaged at the title level. Cambridge Journals Online is the remaining “big” database producing more than a 10% error rate.
The greatest percentage of errors occurred during the middle of 2012 with 6-month rates ranging from 11% to over 14% (rates calculated by averaging the previous 6 months). Six-month averages have remained well below 5% for the entire 2014 period, and the total error rate for the last 6 months was 2.88%. These titles represent a third of all titles verified since Nov. 2011.
By far, the largest proportion of errors relate to “Coverage Dates” at 65.02%. The remaining 34.98% errors are divided between “Access Denied” (13.48%), “Other/Unknown” (12.80%), “EZ Proxy” (4.27%), “URL Incorrect” (2.13%), “E-Matrix” (1.62%), “Title Change” (0.68%), and “Transfer Title” (0.00%). Packages containing 100 or more titles generated above average “Coverage Date” errors, with 72.31% of errors falling into this category. Packages with less than 100 titles generated nearly half as many “Coverage Date” errors relative to the total number of titles and produced 2.5 times as many “Access Denied” errors. More than 82% of errors within packages containing 500 or more titles were related to “Coverage Dates.” There was little variation in the distribution of error types between databases with less than 10 titles and databases with less than 100 titles.
The attached spreadsheet details the breakdown of SerSol databases into several categories and defines which categories are to be verified, as well as the frequency of verification: Access Verification Summary