Page tree
Skip to end of metadata
Go to start of metadata

By Joyce Chapman (Libraries Fellow)

May 2011


Executive Summary

In 2010-2011, the Metadata & Cataloging (M&C) department at NCSU Libraries conducted a test to evaluate the effectiveness of current workflows whereby M&C staff manually enhance metadata for Special Collections’ digital images. The study was led by Joyce Chapman and analyzed differences in use rates (quantified as unique page views for the purposes of the test) for images that had undergone metadata enhancements (“enhanced images”) versus images that had not (“unenhanced images”) in a single collection over a five and a half month period. Findings indicate that manual metadata enhancements greatly increase the use of our digital image collections. Enhanced images accounted for quadruple the amount of use as unenhanced images; 80% of unique page views were of enhanced images and only 20% were of unenhanced images. When we analyzed keywords in Google searches that led visitors to images, we found that 88% led to enhanced images, and 28% of all Google search strings included person names, which were available only in enhanced images. We also examined the traffic sources that led visitors to images. We found that 43.6% of traffic came from Google, 14% came from NCSU sites (such as the alumni blog or the library news blog), and 10% came from sports blogs. Twenty-three percent were direct traffic.

Background

For the past two years, M&C has provided staff time on a regular basis to enhance metadata for materials selected by SCRC staff for digitization. Metadata enhancements performed by M&C staff vary from project to project and may include assigning subject headings, transcribing information from the back of physical images (such as the names of people or events featured in the image, specific type of farm equipment shown, etc.), adding geographic locations, or creating descriptive titles. M&C’s time contribution to digital object metadata enhancements has not been closely tracked over time, nor has there ever been an a performance comparison between enhanced and unenhanced images. In order to discover whether we are making the most effective use of our time, in fall 2010 M&C decided to conduct A/B testing to compare the performance of enhanced and unenhanced images using data from Google Analytics.

Methodology

The study was conducted using A/B testing. A/B testing, common in web-based marketing research, involves a performance comparison between a control group and a single variable test group. If more than one variable is at play during testing, it becomes difficult to know which variable was responsible for performance differences. The variable in NCSU Libraries’ A/B testing was the presence or absence of the department’s generic metadata enhancements as a whole. The effect of any single metadata enhancement on discovery was not tested.

The plan to conduct this study was formulated by Erin Stalberg (Metadata & Cataloging), Jason Ronallo (Digital Library Initiatives), Brian Dietz (Special Collections Research Center), and Joyce Chapman (Metadata & Cataloging). In October 2010, the Digital Program Librarian (Brian Dietz) selected an appropriate collection for use in testing. The collection was a set of photographs from UA023.15, the College of Natural Resources Records. Prior to the test there were no College of Natural Resources images in Historical State. The collection contained 1,195 digital images. Approximately half the images were allocated to testing group A (enhanced images), and half were assigned to testing group B (unenhanced images).

Three staff from the Metadata and Data Quality section of M&C performed manual enhancements to testing group A in October 2010. Images from testing group A were assigned fairly evenly among staff for enhancements and each staff member was asked to report a total number of minutes spent on all enhancements at the conclusion of their work. In this way, we could calculate an average minutes spent on enhancements per image.

Once enhancements to testing group A were complete, all images from testing groups A and B were moved to the production server on November 1, 2010, where they became simultaneously discoverable on the open web. On April 21, 2011 (after five and a half months), data on unique page views, search strings, and traffic sources was exported from Google Analytics. The Digital Collections Technology Librarian (Jason Ronallo) wrote a script that exported a log from the Special Collections Asset Managements System containing data related to all images involved in testing (such as the ID and whether an image belonged to testing group A or B). Data from the two sources was combined and analyzed by Chapman.

Findings
Use data

Findings indicate that manual metadata enhancements greatly increase the use of our digital image collections. Enhanced images accounted for quadruple the amount of use as unenhanced images. Out of all the collection’s unique page views combined, 80% of page views were of enhanced images and only 20% were of unenhanced images. Ninety-two percent of unenhanced images had yet to have a single page view after five and a half months. After the same period of time, enhanced images had been viewed at least once at a rate three times that of unenhanced images.

When we step back and consider the two testing groups together as a whole, 16% of the images in the collection (whether enhanced or unenhanced) had been viewed at all after five and a half months. A comparison of use of recent monograph purchases at our library over a similar time period (the past six months) showed a similar pattern: 15% had circulated at least once. Whether these percentages should be considered low or high is a question that is impossible to answer without the existence of benchmark data.

For more information on questions of general discoverability, see the recent NCSU Libraries report by Chapman, Ronallo, and Lai, “Search Engine Optimization Study: Metadata Practices and Google Indexing of Historical State images.”.

Another question the M&C department had was whether or not there was significant difference in the average number of subject headings assigned to viewed and unviewed enhanced resources. We found that there was not (mean of 1.1 versus 1 subject heading). Unenhanced images had no subject headings.

Search strings

We examined the keywords used in Google search strings that led visitors to view images. Over the course of the testing period, there were a total of 100 search strings that resulted in page views. We found that 88% of these searches led to enhanced images. Note that any direct visit or referral visit is categorized by Google Analytics as "not set." Not sets are discounted from this analysis.

The M&C department also had questions about search terms in relation to person names and about the use of synonyms for “NCSU.” Person names were available only in enhanced images, and we found that 28% of all search strings included person names. One of the questions that arose in another recent study conducted at NCSU Libraries was the extent to which users search for different synonyms of “North Carolina State University.” We found that 38% of the search strings that led to unique page views in this study included such a synonym. Of this 38% of searches, 79% used the term “NCSU,” 13% used the term “NC State” and 8% used the term “North Carolina State.” This does not signify that “NCSU” is the most popular synonym in searching; it tells us only that our metadata provides the most users with the most hits when that synonym is searched. In fact, data from Google Insights tells us that the synonym “NC State” as a search term has been almost three times as popular as “NCSU” the past year in Google (though it is unclear what percentage of searches for “NC State” relate to the university and not the state of North Carolina). Currently, it appears that our metadata is optimized for the synonym “NCSU,” though this is not necessarily the most used synonym. For more discussion on the issue of synonym branding, see the recent NCSU Libraries report by Chapman, Ronallo, and Lai, “Search Engine Optimization Study: Metadata Practices and Google Indexing of Historical State images.”

Traffic sources

The sources that led visitors to view images were also analyzed. We found that 43.6% of all traffic came from Google (including Google engines in several other countries), 14% were from NCSU sites (such as the alumni blog or the library news blog), and 10% came from sports blogs. Twenty-three percent were direct, meaning the visitor typed the URL to an image directly into their browser, clicked on a bookmark, or clicked on a link to the image in a PDF, Word document, or email.

Timing data

The department did not have precise data on time spent on metadata enhancements prior this study. We found that the mean number of minutes spent per image on manual enhancements was seven. Five minutes is the target time per image set by Chapman and Dietz in 2011. Note that when comparing this to timing data for several other collections, we find that the average shifts from project to project, as each project is slightly different.

Appendix

Group A: images with enhanced metadata
Group B: images without enhanced metadata

General stats

 

Raw number

% of total

Total records in the collection

1195

N/A

Total unique views for the collection

271 

N/A

No. records in Group A (enhanced) 

606 

50.7%

No. records in Group B (unenhanced) 

589 

49.2%

No. records that were viewed (>=1 view) 

195 

16.3%

No. records that were never viewed 

1000 

83.6%

% Group A records that were viewed 

147 

24.3%

% Group A records that were never viewed 

459 

75.7%

% Group B records that were viewed 

48 

8.1%

% Group B records that were never viewed 

541 

91.9%

No. unique page views accounted for by Group A images 

216 

79.7%

No. unique page views accounted for by Group B images 

55 

20.3%

Subject headings

 

Mean no. subject heading

Median no. subject headings

Group A viewed 

1.11 

1

Group A unviewed 

1.01 

1

Group B viewed

0

0

Group B unviewed

0

0

Search terms

Total search terms (not including "not set")

100

% of searches with person name included 

28%

% of search term hits that are enhanced (A)

88%

% of search term hits that are unenhanced (B

12%

Synonym analysis

Synonym

% of total

No. of occurrences

NCSU 

79% 

30

NC State 

13% 

5

North Carolina State 

8%

3