Unless you happen to possess luck on a superhuman scale, bad data will lead to bad decisions. Alas, the situation is not symmetrical: good data may or may not lead to good decisions. Good data can be corrupted in context — by the misinterpreter, by the inattentive, by the intrusion of luck of the bleakest kind. The publishing business operates with data that no self-respecting industry would tolerate (can you imagine an executive at Exxon Mobil not knowing how many cars are on the road, how many miles they drive, and how much gasoline they consume?), and within publishing, book publishers have the worst of it, with no hard evidence about who actually purchases and uses their products, assuming they are purchased and used and not simply accessed on a pirate site somewhere or, in their print form, simply serving to dress up a furniture store.
In an attempt to improve the quality of data on the book industry, Ithaka S+R has just released a preliminary report on book acquisition patterns in academic libraries. Katherine Daniel of Ithaka has published a blog summarizing the project here, and the report itself, prepared by Katherine, Roger Schonfeld, and myself, can be found here. This project was made possible by the generous support of the Andrew W. Mellon Foundation. In this blog post I want to review some of the implications of the report, comment on its preliminary status, and explain how it came to be.
First, the background. About 10 years ago, a friend who consults in the public library sector, told me that she believed that Amazon accounted for 10% of all public library book purchases. I was astonished. Amazon is a retailer, not a wholesaler: how could this be? My friend asked the client who had commissioned her study for permission to share the report with me, but they declined, stating that the report was proprietary. Since the obvious place to go for information about Amazon is Amazon, I contacted a high-ranking member of the trade book business to make an introduction for me with a counterpart at Amazon. I got to speak to the head of Amazon’s book operations, who told me that he couldn’t help me — because Amazon does not know if libraries buy books from them. This remark was so blatantly dishonest that I resolved to try to find an answer somehow.
After a number of false starts, I put this project before Roger, whose team at Ithaka S+R came up with an intriguing (and, for me at least, entirely new) strategy. Since more and more libraries were moving to a new generation of ILS (integrated library system), it could be possible to get a data feed from the vendors of these systems. This was no small ambition. First the vendors had to be persuaded to participate (they were). Then the libraries had to grant permission to access their data (they did). Then Ithaka had to develop tools for ingesting the data and putting it into a useful form (mission accomplished). Finally, queries had to be put against that data — and that is part of the ongoing project. Indeed, one of the things we are now contemplating is what other kinds of questions can be put to this data set, questions that go far beyond the original one of how many books does Amazon sell to academic libraries. Note that this method of gathering data eliminates the guesswork. Roger’s group has been looking at the same data that libraries use to run their organizations.
You can get all the details of the methodology from the report linked to above, but to summarize a couple highlights: in our sample, Amazon is the second largest provider of books to academic libraries. So the notion that Amazon is not a wholesaler gets thrown out the window. Another surprising bit of information (highlighted to me by Rick Anderson) is how few books were acquired on approval plans. The fact is that we have had assumptions about the academic book market that probably are just not true.
Has bad data led to bad decisions? Yes. It is not uncommon to hear university press personnel moan that libraries have stopped buying scholarly monographs, partly because library budgets have been appropriated by the Big Deals from STM publishers. The “evidence” for this argument is two-fold. First, unit sales have dropped over the years, so that fall-off had to come from somewhere. Second, academic publishers can look at their sales figures to their principal wholesalers (YBP, renamed as GOBI and now part of EBSCO, Baker & Taylor, Ingram, etc.), and those figures are showing an ongoing decline over many years.
So about those bad decisions based on partial or bad data: let’s cite three. If you believe that libraries have stopped buying books, you get tempted to put them all into a heavily discounted aggregation and sell on the basis of price. The problem here is that many libraries apparently took the discounted aggregations in place of the full-price books they were already purchasing. Action #2: demand-driven acquisition. Once again, if you think libraries are not buying your books, why not put your books into a DDA program? Two problems here: the library may have been buying your books, but you don’t know that; and DDA, even when it works for publishers, delays payments for months, even years. This means that DDA books must be priced at a huge premium. Has anyone noticed a sharp uptick in DDA book prices? I didn’t think so. And #3: open access monographs. Part of the reason (not the only reason) for the enthusiasm for OA books is that it is believed that libraries are not buying books, so OA is the only way to make them available. But libraries are buying books; it’s simply that those purchases are not showing up in the figures at EBSCO and B&T.
Had publishers known the figures for Amazon, they would have realized that a big part of the alleged fall-off in library sales was simply channel-switching, that is, books that were formerly sourced through YBP and others were now being sourced through Amazon. Since Amazon was classified as a retail account, its impact on library acquisitions was overlooked. This piece of bad data has cost academic book publishers millions of dollars.
As for the question of whether avaricious STM journal publishers are depriving academic book publishers of their livelihood, the answer goes beyond the scope of the Ithaka study and I will not comment on it here (but — pssst! — if you want to sell more books to libraries, publish better books and enhance their integration with a variety of library tools and systems).
We are calling this a preliminary report because more data is forthcoming this autumn, which will be loaded into the Ithaka systems and analyzed. Currently the analysis uses data from 54 institutions, all of which use the OCLC WorldShare ILS. By autumn another group of institutions will be providing data through the Ex Libris Alma ILS; we anticipate that the total number of participating libraries will be in the range of 150-200. This brings us to a very important question: Is the data we are working with now representative of the U.S. academic library community as a whole; and will it be representative when we add in more libraries this fall? I think the proper answer to these questions is no. To get a representative sample we need more libraries and they have to be distributed along a broader axis. The Ex Libris data will help — Ex Libris has significant market share among the ARL institutions — but we still won’t have a truly representative sample. So, at best, we can call the current data suggestive and directional, but it is not definitive.
I anticipate that the inclusion of the Ex Libris data will continue to show that Amazon is a major vendor to academic libraries, but its market share will drop (because Ex Libris’s customers are among the largest libraries and the larger the library, the less likely that Amazon will be a sizable vendor). Also, it seems likely that the Ex Libris libraries are more likely to purchase books through approval plans. Another question we will be exploring is the ratio of print to ebooks, as the current figures show surprisingly small figures for ebooks. But we have to wait for the information to come in before making any judgments.
Beyond OCLC and Ex Libris, it will be a challenge to get a fully representative data set for the academic library data. To do that, among other things, Ithaka will have to get data from other ILS vendors, whose architectures may not lend themselves to the data extraction method that has been employed thus far. We will have to find a way around this technical obstacle.
With all these caveats in place, we will begin to analyze further the expanded data set this fall. We anticipate being able to make some comments about the subject areas for which libraries are most eagerly collecting titles and also about the university press sector. For example, what does the aggregate ILS data tell us about specific programs — at Michigan, at Princeton, at Duke, and elsewhere? And how can that data help these presses make sharper decisions, improve their business performance, and make their already valuable programs even more valuable to the research community?
In the meantime, let’s contemplate how much easier this all would be if Amazon were willing to answer one simple question.