Alternative title: “Lessons learned from scanning every single book in a college library.”
An age-old question among Amazon booksellers is “Which category is the most profitable?” Or in more practical terms, when the starter gun goes off at your local library’s preview sale, to which section should you run first to begin scanning? I wanted to see if data could come to the rescue and help us solve this conundrum.
The Experiment: To answer questions about a library sale, why not go straight to the source? The idea is straightforward – simply scan every single book at a local library and then run home to analyze the results. However, scanning more than 100,000 titles at one time doesn’t fit my criteria for an enjoyable day (or more likely, an entire week). Fortunately, I happen to know the director at a college library and she was able to provide me with a spreadsheet of every single ISBN in their catalogue. From there, I simply ran those ISBNs through Amazon’s API to find the current market values. As a bonus, each book was already categorized with the Library of Congress Classification System, which means we could easily find the values for each book category. It was a win-win scenario: I provided the director with a valuation of their entire collection that they could use for insurance purposes, and I was able to use the data to write this blog article!
Here’s a 30,000 foot view of the data:
# of books in the library’s collection: 148,000
% of books that were fiction: <1% (that’s not a typo – and yes, you should be salivating at the percentage of non-fiction books that college libraries keep in stock!)
A few comments regarding the analysis: Many books on Amazon are listed for insanely high prices, which are typically the result of a repricer gone haywire, as shown here:
To combat those ridiculous prices, I cut out all results in which the lowest used price was higher than $500. It’s not a perfect solution, but it helps to avoid severely skewed data. Additionally, I removed any category with fewer than 100 titles to ensure that our sample sizes were a bit more significant. It’s not a perfectly organized experiment, but as they say in layman’s terms, it’s “good enough for government work”!
The Results: Without further adieu, let’s showcase the Top 10 General Categories, in terms of the highest average used price:Categories that failed to make the list include Science, Geography, Education, Naval Science, Medicine, Social Sciences, and History of the Americas.
Overall, these categories are quite broad, so I went one category down in the classification system to hone in on these Top 10 Sub-Categories, again in terms of highest average used price:As mathletically inclined individuals would be quick to point out, averages can be misleading. We can instead ask the question, “Which categories have the highest likelihood of finding a book where the cheapest used offer is at least $20.00?” Here are those results:
As you can see, many of these categories are identical between the two experiments. If this data from a single academic library is an accurate representation of most book categories, it should serve as a decent starting point for planning your course of action at your next big library sale. If you happen to be attending a sale in the Denver area, look for me in the Music section!
P.S. I realize that every library’s collection will vary quite a bit, depending on geography and on who assembled the collection in the first place. Your mileage may vary. Do you agree or disagree with the categories as listed here? Comment below to continue the conversation!
P.P.S. Textbooks aren’t a category in the Library of Congress Classification System. Thus, there’s no mention of them in this post. Until just now, that is.
P.P.P.S. A few people have pointed out that Math (and other science categories) would likely be higher on the list. I would agree with that as a general rule, especially with technical books in fields that don’t get outdated quickly. Math is a sub-category within the Library of Congress system, and it rolls up under Science, which was 11th place in terms of highest average price. Could this data be misleading? Certainly. Although that’s not my intent at all with this post. A single library’s data can be misleading – if anyone has larger datasets that they’d like to be put up for consideration, I’d be happy to take another run at this analysis!