January 2015 Author Earnings Report

Share this:

Executive Summary

  • AuthorEarnings reports analyze detailed title-level data on 33% of all daily ebook sales in the U.S.
  • 30% of the ebooks being purchased in the U.S. do not use ISBN numbers and are invisible to the industry’s official market surveys and reports; all the ISBN-based estimates of market share reported by Bowker, AAP, BISG, and Nielsen are wildly wrong.
  • 33% of all paid ebook unit sales on Amazon.com are indie self-published ebooks.
  • 20% of all consumer dollars spent on ebooks on Amazon.com are being spent on indie self-published ebooks.
  • 40% of all dollars earned by authors from ebooks on Amazon.com are earned by indie self-published ebooks.
  • In mid-year 2014, indie-published authors as a cohort began taking home the lion’s share (40%) of all ebook author earnings generated on Amazon.com while authors published by all of the Big Five publishers combined slipped into second place at 35%.


Full Report

U.S. ebook sales have plateaued — or are even declining, relative to print — declare some widely-cited industry statistics. Publishing pundits opine that readers’ Kindles are all “full” now, and talk about the “glut” of ebooks. News articles imply that consumers are abandoning ebooks and are returning to print books, and then those articles speculate about whether ebooks were “just a fad.” Other pundits assert that indie authors will no longer be able to compete with the Big Five traditional publishers, now that those publishers have begun to price some of their ebooks lower.

Lots of speculation. Lots of flawed studies based on 2008 methodologies. Lots of inaccurate statistics. And very few facts.

As always, we turn to the data for real answers.

This is our fifth quarterly Author Earnings report. It is based on a data snapshot of 120,000 of the best selling ebooks on Amazon, giving us a deep cross-sectional data sample comprising roughly 50% of Amazon’s daily ebook sales. According to the publishing industry’s most oft-cited estimate, Amazon controls 67% of the U.S. ebook market. Thus the title-level data used in our analysis includes roughly 33% of all daily ebook sales in the U.S. No other industry survey or ebook market-size estimate comes close to this level of accuracy or detail.

(Later in the report, we’ll discuss some of the widely-cited official ebook market surveys and industry-wide sales estimates that the industry news sites and pundits rely on for their numbers — and we’ll show you why those surveys and estimates are so remarkably wrong.)

The methodology used in this report is identical to our four previous reports, published in Feburary, May, July, and October of 2014. We capture real-time data from Amazon.com’s thousands of public ebook bestseller lists and sublists. Using a software “spider,” we grab a snapshot of each of the hundreds of thousands of listed books and how well they are selling.

We then group these 120,000 bestselling Amazon ebook titles by publisher type, separating them into:

  • Indie Published (self-published titles)
  • Small or Medium Publisher (any publisher that publishes more than one author, but is not an imprint of one of the Big Five Publishers, not an Amazon Publishing imprint, and not a known self-published author collective)
  • Amazon Publishing (Amazon’s semi-traditional publishing imprints, such as Montlake, Thomas & Mercer, Skyscape, etc.)
  • Big Five Published (the imprints of Penguin Random House, HarperCollins, Hachette, Macmillan, and Simon & Schuster)
  • Uncategorized Single-Author Publisher (publishers who published only a single author in our dataset, but that were not the self-publishing DBA or LLC of a well-known indie author)

We looked at the number of titles in each category that held slots on Amazon’s thousands of hierarchical bestseller lists and sublists, as well as the daily unit sales of each, the gross dollars spent by consumers on each, and the share of author dollar revenue (in the form of royalties or revenue share) going to each.

Earlier reports have described in some detail the methodology we use. More importantly, with each of our reports we also provide a public link to the full raw data set as a 120,000-line spreadsheet, showing all books and calculations the report numbers are based on. (We anonymize author and title names, for privacy.) The spreadsheet also contains “live” versions of the charts and graphs, so the curious can try changing key parameters (like the number of daily sales at each Amazon sales rank, or traditional publishing royalty rates) to see how those changes would affect the overall numbers.

Enough about methodology. Let’s dive right into the data.

The January 2015 Author Earnings Report

Somewhat to our surprise, despite all the overheated media coverage of Amazon’s 2014 negotiations with Big Five publishers and the ongoing discussion of the Kindle Unlimited (KU) program’s effect on indie self-publishers, the pie charts for this author earnings report are almost boring this time around. They look so similar to our October 2014 snapshot, we had to double-check to make sure we hadn’t simply regraphed last quarter’s data by mistake. Let’s start with the mix of titles on the bestseller lists and sublists.


Not much to say here. The number of bestseller list and sublist slots grabbed by different publishing types is nearly identical to what we saw back in October. Indie self-publishers and the Big Five Publishers have each gained a percent since then, but that lies within the bounds of statistical error and is not necessarily significant.

The more interesting takeaway is this:

The increasing prevalence of lower-priced Big Five titles has had no measurable effect on the Big Five’s share of titles on Amazon’s daily-sales-based ebook bestseller lists.

Similarly, the agency-pricing control afforded by the new contracts Big-Five publishers Macmillan and Simon & Schuster have signed with Amazon.com now allows them set their own final retail prices for many ebooks. Both of them have done so for the majority of titles we captured:

  • 81.6% of Simon & Schuster titles in our dataset were tagged with “This price was set by publisher” on their Amazon.com product page.
  • 94.4% of Macmillan titles in our dataset were tagged with “This price was set by publisher” on their Amazon.com product page.

But what effect has Macmillan and Simon & Schuster’s return to agency pricing had on the overall ebook market? Apparently not much.

The return to agency pricing by two of the Big Five has had no measurable effect on the Big Five’s share of titles on Amazon’s daily-sales-based ebook bestseller lists.

Let’s look at something more interesting now: daily unit sales.



There’s an awful lot of blue on that chart.

At least a third of all paid ebook unit sales on Amazon.com are Indie self-published ebooks.

But the 33% shown is an extremely conservative lower bound on the true indie market share. The real number is almost certainly several percent higher, because the vast majority of the Uncategorized Single-Author Publisher ebooks are also self-published titles — we simply didn’t have the time (or energy) to check all ten thousand of them, one by one. And what we’ve labeled as Small or Medium Publishers — a designation we use for all publishers that are not the Big Five and not Amazon Publishing Imprints  — includes a significant chunk of multi-author collectives and tiny indie micropresses publishing through KDP. Many in the industry would classify that fraction under self-published ebooks as well.

In our past reports on Barnes & Noble’s ebook sales, we found the ratio of ebook sales by publisher type to be roughly the same on Barnes & Noble as on Amazon, and together Amazon and Barnes & Noble command at least 75% of the U.S. ebook market. The large indie ebook market share is not an Amazon-only phenomenon. It’s safe to conclude that at least a third of all paid ebook unit sales in the U.S. are Indie self-published ebooks.

But publishing industry pundits usually prefer to talk about dollar market share instead of unit market share. They point to the higher average price of traditionally-published books and say that publishers bank dollars, not numbers of books sold. So what about gross consumer dollars spent on ebooks?


At least a fifth of all consumer dollars spent on ebooks on Amazon.com are being spent on Indie self-published ebooks.

Again, that’s a conservative lower bound. The true indie dollar market share is most likely a few percent higher, including most of the Uncategorized Single-Author Publisher gross dollars and a small portion of the Small or Medium Publisher gross dollars. Projecting to the remaining non-Amazon third of the ebook market, we can conclude that at least a fifth of all consumer dollars spent on ebooks in the U.S. are being spent on Indie self-published ebooks.

The Big Five publishers as a cohort still command just over half of consumer dollars spent on ebooks. But this website is titled Author Earnings, not Publisher Earnings. Our focus is always on authors and how much they take home in earnings, rather than how much money is spent on corporate publisher overhead. We are primarily interested in the portion of that gross consumer spend that goes to authors in the form of traditionally-published ebook royalties or self-published ebook revenue share.

Let’s look at that all-important chart of dollar author earnings next.


40% of all dollars earned by authors from ebooks on Amazon.com are earned by Indie self-published ebooks.

A quick aside on Kindle Unlimited (KU). The indie share of author earnings includes 8% from KU borrows of indie books. In our last report, KU was a brand new part of the author-earnings landscape. To account for it accurately, we crowdsourced borrow-versus-buy ratios from hundreds of indie authors participating in KU, and found that they averaged 1:1 (half KU borrows, half full-price purchases). We used that 50% borrow ratio as a baseline in our author earnings calculations, although we found that plugging in any other ratio instead, even 0% borrows or 100% borrows, made little difference in the overall numbers and pie charts. In November, when Amazon.com announced the size of the October KU “pot” at $5.5 million and the indie per-borrow payout at $1.33, we could now double-check our crowdsourced KU-borrow ratio of 50%. So we did:

$5.5 million / $1.33 = 4,135,338 indie KU borrows in October

Which is exactly 48% of the 8,561,293 paid monthly downloads (purchases + borrows) of Indie & Uncategorized books in KU shown by our data — quite close to the 50% we originally crowdsourced. Perhaps the wisdom of crowds is a thing, after all.

It’s worth taking another look at the above pie chart of author earnings and considering how little this information surprises us now, compared to last year.

Only seven months ago, the idea that indie self-published authors and their ebooks were outearning all authors published by the Big Five publishers combined was jaw-dropping heresy. Today, it’s boring — a widely-acknowledged fact among knowledgeable authors, if not industry pundits. Many authors who publish both ways point out their earnings disparity in favor of their self-published titles, and so this data is no longer surprising.

But what is surprising is how consistent each of our quarterly snapshots has been. And because of that quarter to quarter consistency, we can discern a few broader trends…


Across 12 Months of Quarterly Snapshots, The Broader Trends

From snapshot to snapshot, a percent or two difference isn’t statistically significant. But when comparing five consecutive AuthorEarnings data snapshots over a period of 12 months, a clear trend becomes visible. The most notable change over the last few quarters is the continued progressive growth of indie market share at the expense of traditionally published ebooks. Here, we can see it in unit sales terms, in gross consumer dollar terms, and in the all-important metric of author earnings.

Screen Shot 2015-01-26 at 10.11.05 AM

Screen Shot 2015-01-26 at 10.11.22 AM

Screen Shot 2015-01-26 at 10.11.39 AM

Somewhere between May and July of 2014, Indie Published authors as a cohort began taking home the lion’s share of all ebook author earnings generated on Amazon.com, while authors published by all of the Big Five publishers combined slipped into second place.

We are only looking at one year here, and digital publishing is still in its infancy, as is the transformation of the publishing industry — all claims of “stabilization” and “plateaus” notwithstanding. It remains to be seen what the future holds. But it’s apparent that indie self-publishing remains as viable and robust a publishing option as it was a year ago, and an increasing number of authors — perhaps even the majority, according to Digital Book World’s 2015 publishing survey — now see indie self-publishing as their first choice, and traditional publishing as a backup plan.

Which brings us to an interesting question:


If One-Third of All EBooks Purchased In The U.S. Are Now Obviously And Verifiably Self-Published, Why Do Publishing Industry News Outlets And Pundits Continue To Claim Otherwise?

Back in 2013, Amazon.com and Barnes & Noble separately announced that consumer purchases of self-published ebooks already made up over 25% of all their ebook sales. By early 2014, our first Author Earnings reports found that on both channels, the number was closer to 30%. Over the last 12 months, indie market share has grown another several percent, and today indie books make up more than 33% of all ebooks sold.

You don’t need a fancy software “spider” to see this. Anyone with a web browser can verify it for herself or himself. There’s nothing we have done in our AuthorEarnings data capture and analysis that a curious person with Internet access, a notepad, and a pencil cannot confirm by browsing a few of Amazon’s or Barnes & Noble’s Top-100 lists for different genres, looking at the listed publisher and overall sales rank for each of the top-selling books on those lists, and doing a little basic math. That’s essentially all our software “spider” did, but for a vastly larger number of titles.

So if the one-third share of the ebook market now supplied by indie self-publishers is so clearly and verifiably obvious, why do industry news sites like Publishers Lunch and veteran traditional-publishing pundits like Mike Shatzkin claim otherwise? Why do they continue to insist that indie self-published ebooks only make up a tiny share of the market, and cannot possibly account for a significant volume of sales?

The answer is simple. Bad data.

All of these industry pundits rely on three officially-recognized sources of ebook market size estimates and projections: AAP/BISG BookStats, AAP StatShot, and now Nielsen PubTrack.

Each of these sources arrives at their ebook market-size estimates by collecting self-reported data from a small subset of participating publishers (1,919 for BookStats, 1,200 for StatShot, 30 for PubTrack) and then using average per-title sales numbers from those participating publishers to project the size of the entire ebook market. They do this by multiplying those average per-title sales numbers by the number of active ebook ISBNs (International Standard Book Numbers) purchased from Bowker by the many tens of thousands of non-participating publishers and indie self-published authors. (Here is BookStats describing their methodology.)

In fact, the organizations disseminating these statistics (the AAP, BISG, Nielsen, Bowker) explicitly state that they cannot track self-published books which do not use ISBNs, and that the self-published segment of the market might be underrepresented in their numbers.

But all of them nonetheless make the assumption that the vast majority of ebooks — including self-published ebooks — do in fact use ISBNs.

From AAP/BISG BookStats:

“While the self-publishing market continues to grow in terms of the number of books published, BookStats is limited in its ability to ascribe total value to this group, especially in the case where ISBNs are not utilized. While self-publisher outreach was attempted, response was relatively low. It can be concluded that this sector is underrepresented for these reasons.”

From Bowker:

“Bowker’s analysis is based on ISBN registrations in the U.S. The vast majority of books in all formats have an ISBN.”

All of the industry’s official ebook market-size estimates thus rest on a single key assumption: that ebooks without ISBNs do not represent a significant portion of consumer purchases, and can thus be safely ignored in calculations of ebook sales and market share.

With captured title-by-title data in our hands which represented 50% of Amazon’s daily unit and dollar ebook sales, we were in a position to definitively check that key assumption.

So we did.

Title by title, we checked whether each book in our data set had an ISBN or not.

All 120,000 of them.

And we found that the key assumption underlying all of the industry’s cited ebook statistics and official estimates of market size is wildly, wildly wrong.


The Invisible “Shadow Industry”: Indie Ebooks Without ISBNs

One of the most-widely-cited “official” sources for ebook market size is the annual AAP/BISG BookStats report.

BookStats claims there were 512.7 million ebooks sold in the US in 2013.

But BookStats only counts ebooks with ISBNs.

Here’s what they aren’t seeing…


30% of all ebook purchases in the U.S. do not have an associated ISBN.

Nor are these non-ISBN ebooks lower-selling titles by any means. Many of them are among the bestselling ebooks in the U.S.

In the January 21 dataset, we found that:

20% of Amazon’s overall Top-10 selling ebooks did not have ISBNs.

16% of Amazon’s overall Top-100 selling ebooks did not have ISBNs.

34% of Amazon’s overall Top-1,000 selling ebooks did not have ISBNs.

37% of Amazon’s overall Top-10,000 selling ebooks did not have ISBNs.

Ebooks without ISBNs also represent a large and growing portion of the ebook market as measured in consumer dollars, too:


16% of of consumer dollars spent on ebooks were spent on ebooks without ISBNs.

But again we are primarily interested in the portion of that gross consumer spend that actually goes to authors in the form of ebook royalties or ebook revenue share, rather than the portion that is dissipated on publisher corporate overhead.

Looking at what portion of total ebook author earnings lie invisibly within the no-ISBN “shadow industry,” we see:


28% of of all ebook dollars earned by authors were earned on ebooks without ISBNs. 

And therefore invisible to all officially-recognized industry estimates cited by pundits.

It is no wonder the typical publishing-industry pundit’s reaction to our initial Author Earnings reports has been a mix of dismissal, denial, and suspicion. After all, those pundits have seen official “proof” that our data is wrong… they are paying thousands of dollars each quarter to the AAP, BISG, and Bowker for data surveys and reports that say otherwise.

But the ISBN-based self-published sales that the AAP, BISG, Bowker, and Nielsen include in their reports are only the tip of the iceberg — the other 90% of the self-published ebook market lies invisibly underwater.


Wait! Aren’t ISBNs Necessary To Sell Ebooks Thru Retailers Other Than Amazon.com?

No. They aren’t.

Despite Bowker’s misleading FAQ claims that “most vendors” require an ISBN to sell your book, none of the major ebook vendors actually do. ISBN-less ebooks can be sold on Barnes & Noble, Apple, Kobo, and most other places where ebooks are sold.


Still, Doesn’t Buying An ISBN Confer Some Advantage In The Crowded Ebook Marketplace?

In other words, do books with ISBNs noticeably outperform their ISBN-less peers in unit and/or dollar sales?

From the data, the opposite seems to be true.

Screen Shot 2015-01-28 at 1.53.50 PM

The top several-thousand indie titles without ISBNs outsold and outearned the top several-thousand indie titles with ISBNs.

  • The top 3,830 indie titles without ISBNs sold an average of 42 copies a day and earned their authors an average of $70 a day.
  • The top 3,830 indie titles with ISBNs sold an average of 24 copies a day and earned their authors an average of $52 a day.

Indie books that use ISBNs are selling fewer copies and making less money than their ISBN-less peers.

We caution against reading too much into this finding. Correlation does not equal causation. These results may well be due to a third factor, such as the higher average book prices for books with ISBNs, or a lower average level of entrepreneurial sophistication among those indies still believing they need to purchase ISBNs. But the “why” is of course speculation; we are simply dispelling a common myth using hard data.

What we can say for sure is that the clear lack of any material benefit in the marketplace makes the cost of purchasing an ISBN for an ebook very difficult to justify — the same money would be far better invested instead in better professional editing, proofreading, formatting, cover art, and the like.

As the data shows, the vast majority of indie authors are choosing not to spend money on ISBNs for their ebooks, and that number is growing year over year. And surprisingly, it’s not just self-publishers that are making the choice to abandon ebook ISBNs.

A growing number of non-Big-Five traditional publishers are choosing to forego the unnecessary expense of ebook ISBNs, too.

Screen Shot 2015-01-28 at 11.31.46 AM

Grouping the books in our data set by the year of publication, we see a definite year-over-year trend. Fewer and fewer of the bestselling ebooks are using ISBNs each year. Nearly 25% of indie ebooks sold that were published in 2011 used ISBNs, but three years later, only 13% of indie ebooks sold that were published in 2014 use ISBNs.

87% of ebooks purchased that were published in 2014 by indie author-publishers did not have ISBNs.

Even more interestingly, a growing number of small or medium traditional publishers are no longer using ISBNs either. In 2011, 92% of the bestselling ebooks published by non-Big-Five traditional publishers used ISBNs, but last year, only 70% of the bestselling ebooks by non-Big-Five traditional publishers used ISBNs. Which means that…

30% of ebooks purchased that were published in 2014 by non-Big-Five traditional publishers did not use ISBNs.

Undoubtedly, the 30% of Small or Medium Publisher ebooks that are sold without ISBNs are almost all coming from tiny, agile, innovative smaller publishers and micropresses, rather than large corporate publishers. But these non-ISBN traditional ebooks are just as invisible to Bowker and the industry’s official statistics as the non-ISBN indie ebooks are.

ISBN-based analysis, reporting, and industry statistics about the overall ebook market are now so incorrect as to be meaningless.

When It Comes To Tracking Digital Books, The ISBN Is Officially Dead — It Just Hasn’t Been Buried Yet.

In fact, now we know why the regularly-cited industry stats about the ebook market size and its composition are so far off from observable reality. Because when we, too, put on ISBN-colored blinders and ignore all ebooks that don’t use them, we can see the exact same view of the industry — and roughly the same numbers — that the pundits continue to report. For fun, we’ve done that in the pie charts below.

(Just ignore the black “shadow industry” pie wedge, and pretend the rest of the pie is all that there is.)


Let’s Take A Look At The Numbers The Way The Industry Pundits See Them…




Again, just ignore the black “shadow industry” pie wedge — because if you’re a pundit looking at data from the AAP, the BISG, Nielsen, or Bowker, those ebook sales are invisible. To see what they are seeing, pretend the rest of the pie is all that there is.

Also keep in mind that if you’re a publishing industry news site or pundit, only the top two graphs (units sold and gross dollars earned by publishers) are meaningful to you. The third, showing author earnings — not so much.

When we put on our ISBN-colored blinders, we can clearly see why industry pundits don’t consider self-published ebooks to be a significant component of the publishing industry. Because when you cannot see the black no-ISBN “shadow industry” pie wedges, there’s very little blue left on the top two charts.

When one makes the fatal mistake of relying on ISBNs to estimate the ebook market, only 10% of unit sales and 7% of gross consumer ebook dollars appear to be going to self-published books.

A natural response might be that indie authors and small presses should adhere to publishing standards and purchase ISBNs and use them regularly. The onus is placed on them, as well as the cost. We offer another suggestion: Free ISBNs. CreateSpace offers free ISBNs for every print-on-demand book created using their service. The same should be true for all ebook editions. Indie authors are a savvy bunch, and expecting them to pay for something that does not benefit them at all (as evidenced by our data) is assigning blame to the wrong party.


The Fact That the Official Industry Statistics Fail To Include a Third of the Ebook Market Has Other Far-Reaching Implications

1) The claim that ebook sales are “plateauing” or “declining” becomes highly suspect.

When you look at the entire picture — including the rapidly-growing 30% of ebooks sold without ISBNs — what looks like a “plateau” to the industry pundits and their ISBN-based statistics suggests a different interpretation altogether: what they are actually observing is a progressive shift of ebook market share away from the traditionally-published “visible” portion of the industry that uses ISBNs… and toward the invisible “shadow industry” of ISBN-less self-published ebooks.

2) The claim that ebooks make up only 30% of all books purchased in the U.S. (or only 25% now, or even less, depending on who you ask) — and the claim that print books continue to account for over 70% of all U.S. book sales — both fall apart.

Given that those sales-by-format estimates fail to include a full 30% of all ebook sales that lie hidden in the “shadow industry”, the real market share commanded by the ebook format is far higher than what industry sources are reporting, and print books make up a far smaller portion.


But How Does The Industry Come Up With Those Ebook-vs-Print Market Share Estimates In The First Place?

The industry’s most widely-cited source of print-format book sales is Nielsen BookScan. BookScan estimates the nationwide number of print books sold but does not measure ebook sales. To generate those print-sales estimates, BookScan surveys a panel of nationwide retailers that includes both online and brick & mortar participants: primarily bookstores and mass market retailers. Nielsen BookScan claims they are capturing 80% of all print sales in the U.S. — an unsubstantiated claim that many authors have disputed — and which until recently did not include sales at Walmart, the largest mass-market retailer in America. There is also the curious but rarely remarked-upon fact that Nielsen BookScan’s survey of annual U.S. print sales and the AAP/BISG BookStats estimate of annual U.S. print sales differ by over 250%. But nonetheless, BookScan is the industry’s go-to source for the size of the print market, used to estimate the overall number of hardcovers, trade paperbacks, mass-market paperbacks, and board books sold each year in the U.S.

BookScan reported 620 million print books sold in 2013, and 635 million in 2014.

Let’s take BookScan’s 80% coverage claim for U.S. print sales at face value for a moment. (A dubious proposition, granted. Keep in mind that right now we are only describing how the mainstream publishing industry analysts come up with those estimates you keep hearing; we’re not trying to ascribe any particular credibility to them.)

Adding in the 20% of U.S. print sales that BookScan says their survey of retailers doesn’t capture, we get an estimated 775 million print books sold in the U.S. in 2013, and 794 million print books sold in the U.S. in 2014. Or that’s what Nielsen BookScan is claiming, anyway.

Ignoring the AAP/BISG BookStats-reported 512.7 million ebooks with ISBNs sold in the U.S. in 2013, Nielsen instead projects 205 million U.S. ebook sales for 2013, based on analysis by their recently-acquired PubTrack Digital subsidiary. PubTrack, which the inimitable Kris Rusch dug up some info, apparently aggregates self-reported ebook sales data from “over 30 participating publishers” in a collaborative publisher data-sharing program.

These “Nielsen numbers” for ebook and print sales get presented to the industry at “Publishers Launch” conferences and are cited in Publisher’s Lunch articles titled “Real Data on Print Sales In The eBook Era — And the eBook Plateau.”

Using these “Nielsen numbers” we too can calculate the ratio of ebooks to print books the exact same way the pundits do:

205 million ebooks / (620 million print books + 205 million ebooks) = 25% of U.S. book sales are ebooks

Or, accounting for the 20% of all print book sales that Nielsen says they miss:

205 million ebooks / (775 million print books + 205 million ebooks) = 21% of U.S. book sales are ebooks

The problem with the above “Nielsen numbers” is this:

Amazon.com alone visibly sells over 560 million ebooks and 420 million print books a year in the U.S. When you include the 70 million audiobooks Amazon.com sells annually (split 60/40 between digital downloads and CD format), you get roughly a billion books of all formats that are being sold by Amazon.com each year — a number that is very much in line with the reported 41% share Amazon holds of all new-book sales of all formats in the U.S. and their 64% share of all online print book sales.

Which means that neither the “Nielsen numbers” for the overall size of the print market, nor those for the ebook market, make any sense at all. At best, Nielsen is capturing data on a far smaller subset of both markets than it claims. Or alternately, the AAP/BISG is vastly overstating their estimate of U.S. print sales, which is more than two and a half times as large as Nielsen BookScan’s.

Or, most likely of all, both sets of officially-cited industry data on overall print sales — from Nielsen BookScan and from the AAP/BISG BookStats — are wrong… but in opposite directions

Have a headache yet?

If the industry-reported numbers from the AAP, the BISG, Bowker, and Nielsen all seem bizarrely inconsistent with each other, it’s because they are. And we aren’t the only one noticing these discrepancies. Behind closed doors, the same industry news sites and pundits privately express deep doubts about the accuracy of the data they are publicly espousing.

From Publisher’s Lunch in 2013: “To us, the modest increases in the AAP’s restated direct data makes the far higher, statistically-modeledestimates” the [AAP/BISG] BookStats produces highly suspect, especially for ebooks.”

Nor is the known inaccuracy of official industry data a particularly new concern among publishing insiders.

From Publisher’s Lunch back in 2009:  “…we’ve looked a little bit at the confusingly broad distinctions in the Bowker counts of new titles published last year and tried to reckon with Amazon’s unsubstantiated glimpse into rising Kindle sales. … we very consciously do not report periodic numbers from the AAP, Census Bureau, and IDPF since they are so incomplete (and sometimes inconsistent) as to be more confusing than illuminating. Nielsen Bookscan is great for what it covers, but is also incomplete (and doesn’t capture certain things, like ebook sales, at all), and Bowker’s PubTrack is interesting at reflecting very specific buying patterns and demographics but is no substitute for actual market data.

Perhaps that’s why the AAP and BISG both announced in mid-2014 that they would no longer be collaborating to produce BookStats, and that each would instead try to provide their own separate set of industry estimates in future years.

And remember, none of these “official” industry estimates have ever included or accounted for a full 30% of all ebook sales in the U.S. — the ones that go unreported and unmeasured because they lack ISBNs, and therefore lie in the “shadow industry.”

But those “shadow industry” books aren’t invisible to consumers. Readers are buying them in vast numbers — purchasing between 240 and 260 million of them a year in the U.S. alone. Readers don’t care whether books have ISBN numbers or not… most don’t even know what an ISBN is, and wouldn’t care if they did.

But don’t take our word for it.

You can check our data.

Or simply check the Amazon.com, Barnes & Noble, Apple, and Kobo bestseller charts.


Download the raw data this report is based on (.xslx)

Creative Commons License
Author Earnings is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.


165 Responses to “January 2015 Author Earnings Report”

  1. G says:

    This reminds me of the church’s response to Galileo who was theorizing that the earth was not, in fact, the center of the universe.

    Big Pub is ignoring the obvious evidence of indie ebook dominance because they don’t want to admit they’ve lost the war. And they have. Indies know it.

    The notion that a little guy like me, with no fancy office, no access to B&N brick and mortar stores, no PR person, no cover art team, no muscle behind my books…that I can compete on the same level and outsell the big 5 publishing house books (which I do on a daily basis).

    That is absolutely anathema to trad publishers and all who suck at the teat. They can’t admit it, won’t see it, and will stick fingers in their ears if you try and point it out (as Hugh and data guy have done here).

    What could they do about it? Admit they suck? Admit they aren’t worth the huge chunk of royalties they feel entitled to take out of each and every author’s pie, not to mention the horrific contracts with non-competes, etc. etc.

    They will NEVER EVER admit this, no matter how good the data gets. They’ll spend every waking minute trying to discredit this data, in fact.

    Because otherwise they must admit they can’t, with all their money and power, compete with a little ant like me. A little bug they could supposedly squash underfoot.

    Great job guys. This stuff is shaking the very foundations of the publishing mythology that has been around forever.

  2. Once again, fabulous job guys. I look forward to reading each report as it’s released. Terrific information and useful for authors like me!

  3. Dazrin says:

    Great report once again!

    One error I see – the graph titled “Comparing Top 4,000 Best Selling Indie Books with and without ISBNs” shows the “Average Daily $ Author EarningS” for no ISBN at ~$44 and 70 copies, the text below says that it is 44 copies and $70. So either the bullets below are backwards or the column labels are backwards.

  4. Susan Illene says:

    One question I do have is in regard to the sales-to-ranking ratio you use. I’ve noticed since KU came out that I’m having to get a lot more sales to reach the same ranking as before. At first I ignored it since there were obviously going to be some growing pains with the new system, but it’s been consistent for months now. Also, when I compare my sales-to-ranking ratio to authors who have books in KU (none of mine are in the program) I’m finding they don’t have to have as many borrows/sales (combined) to reach the same rank. Have you all factored that in when doing your calculations? Maybe it won’t make too much of a difference, but I’ll be happy to provide my stats if you want them. I would love to see a comparison done.

    Regardless, thanks for the hard work you guys do. I appreciate it and read all your reports.

    • Data Guy says:

      We’ve left the sales-to-ranking ratio consistent over the course of our reports to make comparisons more direct and trends easier to see. But as you point out and we’ve also observed, each rank now means more sales/borrows than it did a year ago. Why? The overall ebook market has grown, and as a result, we’re actually now underestimating Amazon.com sales/borrows a little with our slightly-dated sales-to-ranking ratios.

      But we’re okay with that. We prefer being extremely conservative and understating — rather than overstating — the indie numbers.

      We haven’t observed the “ghost borrow” effect you mention happening with our own books, but others continue to report it. Here’s a writeup of an excellent experiment testing what does and doesn’t factor into Amazon’s sales rankings:


      Thanks for the offer of your data, Susan — depending on what we tackle next, we might later take you up on it. 🙂

      • Susan Illene says:

        Thanks for clarifying, Data Guy. I’m glad to hear you all have noticed it taking more sales to reach the same ranking as before, even if you are choosing to use the old ratio for now with your reports. It definitely proves the ebook market is growing!

        That’s strange you’re not seeing the difference between KU and non-KU books. Maybe one of these days I’ll have to put a book in and see how it does for comparison. For now, I can only go by my results vs. other author’s and it’s been a big difference.

      • Nirmala says:

        I looked at the post you mentioned. The most troubling thing on there was a comment at the end that suggested that people might be scamming the ranking system by borrowing, returning, and reborrowing a book repeatedly on Kindle Unlimited. If every one of these “borrows” boosts ranking even before 10% has been read, this could be used to move a book up the ranks.

        I have no idea how prevalent this might be and whether it could ever have an effect on the data in these reports.

        • Data Guy says:

          …people might be scamming the ranking system by borrowing, returning, and reborrowing a book repeatedly on Kindle Unlimited…

          When I read that part, I actually tried it, Nirmala. With one of my own books. A few dozen times in a row. Purely as an experiment, of course… 😉

          It doesn’t work. My rank didn’t change.

          • Nirmala says:

            That’s a relief….especially since my wife and I just removed all of our books from Select. I would not like to think that people were using a trick like that to boost their rank when I can no longer try it myself….just as an experiment of course 🙂

          • Nirmala says:

            Seriously though, thanks for checking. It is reassuring to know Amazon can tell the difference between a legitimate borrow and something like this.

      • Edward W. Robertson says:

        Ghost borrows are real. It’s very simple to test—upload a book, enroll it in KU, then borrow the book. Don’t open the book. A few hours later, you’ll have a rank, but no borrows will be registered on your dashboard.

        Figuring out the overall ratio of ghost borrows to paid borrows is extremely challenging, but without accounting for them, the income estimates on KU titles are gonna be inflated.

  5. Jeff Dwyer says:

    Great work, gentlemen. I’ve lived long enough to have worked inside, and now outside the traditional pub world. The indie movement is a blessing to the talent that your data collection helps verify. Please stay with it. Nicely presented.

  6. Christy says:

    Thank you so much for doing this! I so look forward to these reports. Hugh, can I ask you — is it important that we get ISBN’s on our e-books? I didn’t realize that anyone was tracking ISBN when it comes to e-book sales. Or is it just an insider thing and it doesn’t affect lists like the USA Today or NYT bestseller lists? Thanks!

  7. Jan O'Hara says:

    Well done, you two. Amazing depth and breadth of data on an evolving industry. Amazing that you both do this for free! Count me as one of the grateful.

  8. Karen Myers says:

    One comment on the free ISBN from Createspace… That is one of CS’s bundle of ISBNs and, consequently, your book is listed in the Ingram database as “Publisher = Createspace”. (BTW, this is true EVEN IF you use your own purchased ISBN for the book, when you use CS’s expanded services.)

    Some bookstores see “Publisher = Createspace” as a flag for Amazon and are prejudiced about carrying such titles.

    So, this is not an entirely free option. Free in dollars, but not in distribution risk.
    BTW, I’m of the school that prefers to use ISBNs, and at the $1000/1000 price (raised recently), they worked out to $1/unit, so that’s as close to free as I needed. I already know where 100 of them will be used, so even if I stopped there, that’s $10/unit, and cheap enough.

    • This is why I’ve published through Ingram’s LightningSource division since the beginning of my indie career. Yes, it costs a bit to set up, but this is a business, worth investing in, and I can always tell bookstores and other bulk purchasers to just add my books to their Ingram orders — my imprint looks like any other small indie press.

      Now that we can even mark our POD titles as returnable, I’m finding a lot of doors open to us that weren’t even a couple of years ago. One friend of mine found that being able to cite that enabled her to land a book signing event at a local B&N — sort of a gold standard for author events in many folks’ minds. 🙂

  9. Bill Butler says:

    Thank you for the informative report and analyses. I am reminded of a parallel in the newspaper industry, which largely seemed incapable of accurately seeing the readership numbers as the internet exploded, other than their own declining ones. Until it was too late. Same sotry with the Big Three television networks, to a lesser degree.

  10. Mgon ♥ says:

    Awesome report. So exciting to see the data prove so many great things for indie authors… like NOT needing ISBN numbers! Woohoo 😀

    Thanks for all your hard work, and for doing so much for the indie community. It’s wonderful.

    God bless you!

  11. Vinny O'Hare says:

    Great report and very detailed.

    My question is out of the ebooks being bought from the Big 5 how many are classic books that people want on their kindle/iPad that are just replacing the printed version. In this case it would throw off how many people are buying new releases. Meaning the Big 5 stats are a one time purchase on a good chunk of their books.

    That can’t be good for the long term for them.

  12. Pharosian says:

    I learn so much reading these reports! It was good to see how the numbers come out when the non-ISBN book sales aren’t factored in. I always wondered how there could be such a disconnect between what Big Publishing was reporting and what Author Earnings was reporting. I wonder now whether Big Publishing will acknowledge the flaw in their claims, or whether (as someone commented above) they will spin, spin, spin, deny, deny, deny.

    Thanks, guys!

  13. Smart Debut Author says:

    Publicly, I doubt we’ll see anyone associated with Big Publishing even acknowledge a word. I mean, look at those old quotes from Publisher’s Lunch: they’ve known for years that the official data they are getting from Nielsen or AAP or whoever is crap.

    Privately, they’ll chalk up Amazon’s decision not to require ISBNs as some sort of insidious and deliberate sabotage plan aimed at the publishing industry.

    But it’s obvious that Amazon eschewed ISBNs for a far simpler and more benign reason: the cost of ISBNs were an unnecessary barrier to entry for the smallest publishers and self-published authors. Look at how punitive Bowker’s ISBN fee structure is toward folks only buying a few at a time:

    1 ISBN costs $125.00
    10 ISBNS cost $275.00
    100 ISBNS cost $525.00
    1000 ISBNS cost $1250.00

    An indie self-publisher buying ’em one at a time from Bowker is paying at least 100X more per ISBN than a Big Five publisher does. Bowker’s inflated ISBN costs at the low end were effectively another “gatekeeper” that kept the cash-strapped riffraff from self-publishing their handful of books. And that’s no accident; the industry liked it that way.

    Looks like that shit blew up in their faces pretty bad, huh?

  14. Thanks for the shout-out, Data Guy! And for this wealth of information. It makes my aching math brain feel so much better to see your numbers in black-and-white. I’m glad I waited on my next numbers post until you did this… 🙂

    • Data Guy says:

      You’re an inspiration, Kris!

      If you really want to go nuts with math, there’s the linked 24 MB Excel file with all the raw data, too. 🙂
      All kinds of other data in there that we didn’t get a chance to go into in this report: review counts and scores, genre categories that can be parsed, the Big Five’s numbers broken out individually, etc.

  15. Awesome, awesome stuff. Thanks, Hugh and Data Guy!

    I’d like to self-report as an indie with my own imprint name. Fiddlehead Press publishes Anthea Lawson, Anthea Sharp, and a couple multi-author bundles. A fair number of my titles are consistently in their Top 100 genre lists, and so are being counted – maybe as Small Press instead of Indie? Anyway, if in the future you wanted to skew Fiddlehead Press into the indie numbers, it might help clarify the data just that teeny bit more. 😉

    • Data Guy says:

      Fiddlehead was indeed classified as “Small or Medium Publisher” — reclassifying it as Indie now. 🙂

      Besides the Fiddlehead books, I found several of your other books and box sets in the data, too, including ones listing “Anthea Sharp” as the publisher, and ones with no listed publisher at all. Both types were already classified as indie.

  16. Ferran says:

    Is that “41% is amazon” [of “new purchases”, BTW] any better than the rest of the publishing world’s stats? Where does it come from? http://codexgroup.net/ doesn’t provide much info, does it?

    Take care

    • Data Guy says:

      Great question, Ferran. I just don’t know (yet).

      Codex Group apparently does some kind of consumer survey to get their numbers, and that 41% is cited like gospel by publishing pundits. But the details are fuzzy.

      The lack of hard numbers from the brick-and-mortar print side of the industry makes it difficult to ascertain whether that 41% is close, or way high, or even way low. Despite earnest attempts by BookScan, BookStats, and StatShot to suss out overall U.S. print sales, no one in publishing really seems to know… and the various “official” estimates that come from the above sources vary by 250%, which doesn’t help things.

      We can see this, though: Amazon sells 420 million print books a year in addition to their 562 million ebooks.
      (I’ve got the spider pulling data on over 400,000 of Amazon’s bestseller print books, too — fodder for future reports.)

      • Ferran says:

        Well, and most of it seems self-reported. To get a dataset that will be shared with competitors. In a culture with questionable business practices.


        Take care.

  17. ‘Your old road is rapidly agin’, please get out of the new one if you can’t lend a hand.’

    Indie publishers know this stuff.
    Trad publishers don’t want to know this stuff.
    But, if you’re a new writer and you’re still uncertain about where the future lies then please read this report.

    Great work with the ISBN’s. One of the weapons of the industry which now, thanks to you guys, has been completely blown apart.

    I look forward to your next report and the new ammunition you provide me with.

  18. Much as I love reading these reports, and encouraging as I find them, I did notice a touch of bias slipping past the mental editor:

    “Undoubtedly, the 30% of Small or Medium Publisher ebooks that are sold without ISBNs are almost all coming from tiny, agile, innovative smaller publishers and micropresses, rather than large corporate publishers.”

    However, I imagine the kickback you guys receive from the Agile Innovative Publishing lobby isn’t all that weighty…

  19. Kay Franklin says:

    Opened my email to find your update… 🙂

    As always this is really fascinating. My personal opinion with regards ebook and physical books is that generally people do prefer a physical copy. However, what ebooks allow is an instant read which in todays society is of huge value.

    I will often purchase the kindle and then the hard back copy when it’s a book I know I want to delve into again and again. Ebooks are definitely here to stay. They compliment physical books rather than detract from them.

    Thank you for compiling this report and encouraging indie authors to keep at it!

  20. Thanks so much for all the work you put into this. Great information.

    Just a small point – ISBN stands for International Standard Book Number, not Industry Standard Book Number.

  21. Alan Spade says:

    It hasn’t been sufficiently stressed in this report that the data encompasses the Christmas sales and earnings: so, that’s a response to the people who thought the data of the other reports was skewed because it didn’t encompass this strategical period.

    I think that’s it’s the first report that I read criticizing in such depth how much the official data based on ISBN is skewed. Thank you for that!

    There’s a thing that might very slightly alter the overall data, though. It’s related to Amazon publishing. Amazon’s imprints (including Amazon Crossing) benefit exclusively from Kindle First: these are ebooks chosen among Amazon imprints becoming free for a one month period of time for all Amazon’s customer who have subscribed to Amazon Prime. So, each time a prime member downloads a free Kindle First ebook, it is accounted as one sale.

    On January 2015, for example there has been four Kindle first ebooks constantly ranked in the Kindle Store overall top 20. Granted, the number is too small for the overall data to be skewed, but should the number of Amazon Kindle First ebooks increase, that would have to be taken into account.

  22. TheSFReader says:

    Dear Data Guy and Hugh, Thanks a lot for the information !

    If I may suggest something which may disminish some uncertainty in the Indie/Small/Single repartition :

    Add a form so that author/publishers in any of these category can give their own classification , either to confirm being Small/Single, or to be added to the Indie pool.

  23. TheSFReader says:

    An other observation : “When one makes the fatal mistake of relying on ISBNs to estimate the ebook market, only 10% of unit sales and 7% of gross consumer ebook dollars appear to be going to self-published books.”

    I don’t agree with this observation : when removing the no-isbn books, the proportions don’t change, but the total should still be 100%.

    As far as I understand mathematics, the numbers should be reshuflled as such :
    RemovedNoISBNPercentange = WithISBNPercentage * 100 / (100 – NoISBNPercentage)

    This would give for unit sales 14% for indie and 50% for Big5
    For gross, 8% for indie, and 60% for Big5
    For author earnings : 20% for indie, and 47% for Big5

    (I leave the other categories as an exercice 😉 )

    • Data Guy says:

      Excellent point. 🙂

      In percentage terms, as you show, the indie market shares would appear a few percent higher when taken as a percentage of what’s left after removing the “shadow” pie wedge.

  24. TheSFReader says:

    Still nitpicking here, this time WRT the market shares : Is there any way to quantify how much of the Small Medium Publisher/Single –> Indie market share can be attributed to re-classification of the publishers to Indie ?

    One way would be to re-compute the market shares with a reference qualification, perhaps the latest one, and applying it to the previous reports. (Or the other way, use an old list and apply it to the newest data)

    • Data Guy says:

      Another absolutely excellent question.

      The answer is very little – I just checked. Less than 0.1% of what was originally classified as Small/Medium Publisher income has been reclassified over the course of the last few reports. What you are seeing there is actual market-mix shift.

      On the other hand, ~1.3% of what was “Uncategorized” income back in Feb 2014 report has since been definitively classified as indie, while another ~0.2% of it has since moved into Small/Medium Publisher income.

  25. John says:

    So let me get this straight. Two guys armed with a spider they wrote to scrape data from publicly available data sets just showed up an entire industry, all for free?

    I’m sorry but I am rolling on the floor with laughter. This industry of gatekeepers just got their ass handed to them.

  26. Maggie Lynch says:

    Thanks Hugh and Data Guy! Great report as always. I admire anyone who actually enjoys doing this stuff. I’ve done a lot of data analysis in my past life in academia but admit that I never enjoyed it. I enjoyed the results but not the work, so my hat is off to you for undertaking this so often.

    I do have a couple of questions. I apologize in advance if these are already answered. When I read these reports I admit to somewhat scanning because I read a lot of reports from the industry and other places.

    First, is this data gathered only from Amazon U.S? I am assuming so, as you didn’t break out country information. Because I distribute worldwide, I am always interested in the differences between countries and their trajectory–which appears to be quickly rising–versus the U.S. market which seems to be stabilizing at more modest rise each year now.

    Based on it being U.S. – centric, I wonder if the whole ISBN question is then different. My understanding is that other countries require ISBNs and that distributors who do have other country partners generate some type of ISBN (fake or otherwise to meet the countries needs) in order to distribute in those countries. Like a few other people said, when you purchase ISBNs in bulk the cost is not so daunting. I am one of those people who like to be able to track across all platforms. Also, in some countries (i.e., Canada) the cost of ISBNs is free.

    My final question is around the interesting data in the “Earnings by Publisher Type” graph. The graph shows “uncategorized single-author publisher” in a downward trend versus the decidedly upward trend of “indie published” titles. It seems to me this would be the same group of people. What is the differentiation in your mind and what data search did you use to differentiate it–a press name versus none?

    I also wonder about the slight downward trend of “small publishers” in terms of what search criteria you used to identify them and any parameters (e.g., two or more authors titles). Was it based on imprint name as Amazon calls it in their metadata? Account identifier matching? Or some other data. I ask because of author publishing cooperatives, like Windtree Press and Book View Cafe and others, where the authors are definitely indie but the imprint name is the press name. Are these being counted as indie published or small press?

    Thanks for any thoughts you have on these questions. Again, I bow to your expertise and your willingness to take this on at all.

    • Data Guy says:

      Thanks, Maggie! And good questions.

      Our data is from Amazon US only. One of these quarters, as time permits, we might also take a look at other countries: starting with the UK or Canada, probably – its very encouraging to hear about (and see) that number growing. But so far we’ve been focusing our efforts on where the vast majority of sales and author earnings are concentrated first.

      On Amazon UK, one of my own books was recently ranked in the overall Top-200 and was selling 150-ish books a day at that rank, and others report hovering in the Top-10,000 there with three to five sales a day. So I roughly figure that Amazon’s total UK sales add up to between a fifth and a third of Amazon US’s numbers: well ahead of the US ebook sales for Barnes & Noble or Apple, so Amazon UK’s probably a good next store to look at.

      ISBNs really are a non-issue for authors. They’re more of a problem for the publishing industry’s pundits are their ability to make meaningful observations about ebook trends. Bowker’s decision to charge steep rates for ISBNs in the US has rendered the industry blind to its fastest growing segment. All the breathlessly reported official ebook statistics and analysis are basically just misleading garbage now as a result: it remains to be seen whether the ISBN-based reporting will continue, or Bowker, AAP, et al will figure out new methods of tracking and estimating ebook sales. As we’ve shown, it can definitely still be done. 🙂

      Uncategorized are most likely 95%+ indies, but we didn’t have time to check them all. We started with the top-earning ones and worked down until the per-book earnings dropped below a certain threshold. The algorithm we used was manual: Googling the listed publisher names and author websites, and figuring out whether they were verifiably self-published or not. If we couldn’t find anything, we left them Uncategorized.

      Here’s more detail on what we did: http://authorearnings.com/note-on-methodology/

      Hope that helps.


    Dear Sir….Thank you for this report…

    The last 4 lines were cool…

  28. Data Guy says:

    This is kind of topical right now… 🙂

    Industry pundit Mike Shatzkin just wrote a long blog post in which — as an aside — he mentioned why the AuthorEarnings analysis is “doomed” to irrelevance: because some of his friends in publishing say that the true royalties at “big houses” are 40% rather than 25% when you factor in all the unearned advances they pay. He provides zero actual data in support of that statement but, just for fun, let’s see what the world would look like if Shatzkin’s 40% were literally accurate.

    I plugged Shatzkin’s magical 40% number into our calculations for traditional publishing’s royalties, instead of the 25% those contracts actually say. Here’s what we get:

    Author Earnings based on contract royalty terms: 35% Big Five vs 40% Indie

    Author Earnings based on Shatzkin-math: 42% Big Five vs 31% Indie

    Still not seeing why Mike Shatzkin thinks that a hypothetical 40%-of-net traditional royalty would make all the rest of the data irrelevant… but maybe it’s just me. 😉

    • Matt says:

      Wait, is this esteemed veteran publishing consultant Mike Shatzkin who you refer to as only an industry pundit? For shame!

      If you’re only looking at best selling authors who are paid over the top advances then your logic (or idea logic…) is screwy from the start. His Shatz-math is as bad as Whale-math, and he knows it. But when you’re sucking at the teat of the publishers what are you going to do? You have to side with them.

      Also, his fuzzy idea-logic only applies to author earnings and leaves the other conclusions unchanged i.e. indies have captured 33% of ebook sales and 19% of gross $ ebook sales. He’s trying to say one “rotten apple” (in his mind) spoils the bunch – blatantly false.

    • Publishers Lunch says:

      Here is what you are missing about Shatzkin’s point, and it’s critical you are concerned about what authors earn. What traditionally-published authors “earn” is what is paid/out through to them by publishers, whether or not it corresponds to sales in the marketplace.

      The 5 largest trade houses have annual US sales in the range of $5 billion to $6 billion. If as a group they are paying out 40 percent of annual revenues to authors, then those authors are earning $2 billion to $2.4 billion a year.

      If I’m reading your spreadsheet correctly, you are saying that your data indicates indie authors are earning roughly $204 million a year on Amazon ($558,000/day).

      • I was under the impression that Mr Shatzkin’s 40% number referred to royalties paid to authors, in that a contract rate of 25% was “actually” 40% because of unearned advances. I’m pretty sure no publisher pays anywhere close to “40 percent of annual revenues” to its authors. You may know something I don’t.

        • Publishers Lunch says:

          Thanks for the note — but we do know things you don’t about the industry. For better or worse, we have reported on the consumer trade publishing business every day for the past 15 years, and looked as closely as we can at many of the key business issues.

          Shatzkin is saying very much as we represented, and what publishers told him is what they have told us and agents as well. Big publishers pay roughly 40 percent of total revenues to authors. “Royalties” confuses the issue, as Mike explains, since the overwhelming percentage of traditionally-published authors never “earn out” their advances; their “effective royalty” (what they are paid, as a percentage of what the house takes in) is higher than the contractually stated royalty.

          The direct quote from Shatzkin’s blog post is:

          “In fact, I have been told by three different big houses what they calculated the percentage of their revenues paid to authors amounted to. We could call that the true royalty rate. The three numbers were 36, 40, and 42 percent….
          Take that on board. Big publishers are paying 40 percent of their revenue to authors! That leaves them 60 percent to pay everything else: overheads, manufacturing, and profits!”

          • TheSFReader says:

            Ohhh ! 40% !!! I wonder then with such a great risk why big-publishers don’t negociate for lesser advances against higher royalties… Oh, looks like they negociate for lower advances all right. Still waiting for higher royalty rates though…

          • Thanks for the note — but we do know things you don’t about the industry.

            Can you break down the sources of the five publishers revenue?

          • Nirmala says:

            Another question is: How are the advances/royalties distributed across all of the authors published by the big houses? If a handful of best-selling authors are receiving say 50, 60 or even 70% or more of the net revenues from their books, that would majorly skew the figure for the percentage of all revenue paid to all authors, especially since the brand name bestselling authors must be earning a higher percentage of overall revenue at most publishing houses than say an average mid-list author. And it would leave most traditionally published authors up in the cheap seats with small advances and net earnings close to the 25% in their contract (or even less for print books). It is a lot more likely to never earn out a multi-million dollar advance than it is to never earn out a $10,000 advance.

            It seems the point of this whole website is to show what self-published authors as a whole are earning. Personally I do not care if Stephen King is making way more per book sold than most other traditionally published authors. But I do care that I am making way more per book sold than most traditionally published authors.

          • G says:

            Hi Publisher’s Lunch.

            Are most authors regularly informed by their agents and acquiring editors that the advance they get is most likely to be the only money they’ll ever see for the book deal they’re signing?

            After all, since your contention is now that royalties aren’t as important because most books don’t earn out, then I wonder how often it is that authors are told this advance money is really the only payment they’re ever likely to get.

            I think that would be important information for authors to have in the future, since most that I know believe the advance is just the beginning–not the end of their income stream.

          • PG says:

            The 40% figure is a reflection of the blockbuster mentality of major publishers.

            Due to most-favored-nation royalty provisions that would require the publisher to increase royalties to dozens of other authors if they paid a higher royalty percentage to a blockbuster author, publishers increase the effective royalty to the blockbuster author by paying an advance that will never be earned out.

            For the large majority of traditionally-published authors, the advance is expected to earn out. Indeed, if such authors don’t earn out the advance on the first book of a multi-book contract, they may find that the publisher declines to publish the remaining contracted-for books.

            Because of the consolidation of financial statements of major New York publishers with the international media conglomerates that own them, I’m not aware of any way to check the accuracy of the 40% claim, but do know that authors who don’t earn out a $50-100,000 advance disappoint their publishers in ways that can harm those authors.

            Traditional publishing definitely divides the 1% authors from the 99% authors.

        • Gordon Horne says:

          Mr Shatzkin’s 40% is the amount of publisher revenue paid to authors.
          AuthorEarnings’ 25% is the common percentage of cover price paid to authors.

          The 40% figure does not change the number of units sold or the gross value of those units at all. It only changes the amount of money earning by authors if in aggregate authors are not earning out their advances and are being paid more than 25% of gross sales.

          Since unit sales and gross sales are independent of whether the 25% or 40% figure is used, we can take gross sales (the aggregate of all books cover price) as unit “1” and Mr Shatzkin’s “revenue” as “X” and test whether traditionally published authors are in aggregate paid more than royalties would dictate.

          0.25 * 1 = 0.40 * X
          X = 0.25 / 0.40 = 0.625

          So revenue is equal to 62.5% cover price if we assume the total dollar figure paid to authors is the same.

          Now publishers offer discounts of 40% to 50% to print distributors. Sometimes more for special clients. So we would expect revenue to be not more than 50% to 60% cover price. An author earning 40% of revenue would be earning between 20% and 24% of cover price of units sold. This is based on unit sales which is not altered by what portion authors are paid.

          With our calculated 62.5% revenue, an author earning 40% of revenue would be earning 25% of cover price. (By definition, since we assumed total monies paid to authors was the same.)

          Assuming Big 5 Publishers have a 70/30 split with an Ebook seller, an author earning 40% of revenue would be earning 0.28% of cover price. This is based on unit sales which is not altered by what portion authors are paid.

          All of which leads us to the conclusion that within the accuracy of limited data, 25% royalties and 40% of revenue are the same thing, and Mr Shatzkin has said nothing of substance.

          • TheSFReader says:

            I’m not sure how related the situation is in the US as opposed as in France( wher I live), but here is this report about French publishing (which includes data from Big5 groups as evidenced in the list p 4).


            As I understand (but could interpret things completely wrong, since I’m no great economist), “costs” related to authors are under 20% (see p 11).

            That includes Print and ebooks (no great ebook market here). Here, royalties rates are around 10% of publisher’s fixed selling price, for ebooks and print. Prices vary from 6-7€ for a MMPB to 22€ for a Trade paperback. (hardcover is almost non-existant, only for “luxury” editions and “European comics”).

          • Nirmala says:

            I believe most Big 5 contracts are for 25% of net for ebooks even though the print book royalty is usually stated as a percentage of the list price. So 25% of net is probably the same as 25% of revenue as Shatzkin is using those terms.

      • Nirmala says:

        Seems like a case of apples to oranges here…..you compare the purported payout to authors from all US sales to what indies earn from ebook sales on Amazon. Data Guy is comparing ebook earnings on Amazon to ebook earnings on Amazon.

        To address your point, he would have to redo the charts that included print sales that are in this report to reflect Shatzkin’s 40% figure :

      • Data Guy says:

        Publishers Lunch:

        You misread the spreadsheet. That’s on me; apologies for any confusion. We’ll try to make the next one a little clearer.
        That $500K-ish daily number you refer to is only from the books we captured and measured — not from all of Amazon’s indie sales.

        Thank you for sharing your perspective — it’s welcomed and valued here. We’d prefer to see your data, though — as much of it as you can provide without violating confidentiality agreements.

        Can you share some?

        • Publishers Lunch says:

          Thanks for the reply, Data Guy. As I said, I couldn’t tell if I was deriving a correct conclusion for your data box or not. Is there a fuller number that you can share for your estimated total daily/annual earnings for indie authors?

          I answered some of your granular questions in the other thread/reply further down. Some things I’m allowed to divulge/discuss; others are on background. If you have other particular questions I didn’t address, let me know and I’ll try — or tell you that I can’t.

          • Data Guy says:

            Thank you for turning this into a two-way discussion, Publishers Lunch.
            It’s extremely helpful to all of us.

            I’ll absorb your answers and come back with more questions as I get them.

            I am also not averse to sharing information in more detail one-on-one, as long as we jointly respect the privacy of individual author incomes (measured or projected) and the confidentiality of the agreements you have with publishers and other sources.

            It’s a big industry with a lot of moving parts, and we’re all of us just trying to make sense of it. 🙂

          • Data Guy says:

            And sorry, to answer your question about estimated total Amazon ebook daily earnings for indie authors, the range I get is:

            $930,000 – $1,060,000 per day from Amazon.com

            which x 365 days a year projects to roughly

            $340 million – $390 million per year from Amazon.com

          • Publishers Lunch says:

            Happy to participate and help where I am able to do so. All the available industry measures are incomplete or inexact in some fashion, and most also require some understanding of context. Everyone benefits from knowing/understanding more about those moving parts.

            We keep all kinds of things private/off the record, so no problems there. I actually have some granular questions/observations for you that are probably best dealt with through email. (I think you can see my not published address….)

  29. Wow, thanks for this. Absolutely fascinating. It seems that reported pub industry stats are in the same class as headlines bemoaning Hollywood’s “declining” box office; by failing to account for constantly-growing foreign receipts, the number-crunchers exclude 60-80% of total b.o. receipts. Makes for nice catchy headlines and sympathy-mongering–but does not reflect observable reality.

  30. I love reading the quarterly reports, and thank you so much for your hard work in compiling them.

    I have a question about what this means for individual authors. If, say, a million Indies make every day a single $1 sale each, that’s $1,000,000 a day. At the end of the month, each of them will have made $30.

    On the other hand, if the Big Five only have, say, ten thousand authors in total and they also make $1,000,000 a day, that will leave each individual author with $100, or three times as much.

    Is there a metric to show us the ratio between Indies and Big Five authors (not books or earnings)? Apologies if there is and I’ve missed it.

    Thanks again for helping bring some clarity into the fog of war 🙂

  31. Thanks for doing all the hard work for us. ISBNs going out of fashion for indies is yet another sign that one does not need to be a club member any more. Is eBook growth denial like climate change perhaps? 😉

  32. JoAnn Ross says:

    After 30+ years of being sucessfully and mostly happily published in hardcover, mass market, and trade with HQ and a bunch of what’s now the Big 5, just being able to track daily sales as a first year indie author (or maybe, in your charts as small Castlelough Publishing) has been a revelation. But this big picture analysis you guys do is amazing! Thank you!

    You’ve also proven what I figured out decades ago. That publishers could actually give me clarity in royalty statements if they wanted to. But it’s more advantageous to treat their authors like mushrooms. Keep them in the dark and feed them a bunch of bullshit. 🙂

  33. Question for data guy: if a book is sitting at say #60 on the day you gather data, what sales do you attribute to that book? And if one is at #125, what sales? Do you lump them together, and say anything between #20 and #100 we calculated “X” amount of sales? Or do you assign a sales figure for each ranking?


    • TheSFReader says:

      As far as I understand the XLS file, the sales for an ebook are estimated by linear interpolation between a series of steps (P3:R18 in the file).

      With the “default” values (changeable in the file if you want to experiment), #1 has an estimated 7000 daily sales,
      #20 -> 3000
      #60 -> 1615
      #100 -> 1000
      #125 -> 875

      AFAI Understand, while the estimation could be off, it will be off consistently for each category, only changing some kind of “scale” factor, so the results are still valid.

      My only reservation is via the introduction of non-relative values in the otherwise “floating” analysis with the introduction of “average KU Payout”.

      • Thanks for the reply.

        I’m not good enough with numbers, to know if my calculations are correct. I guessed the total sales of books ranking at these levels: 61, 110, 275, 340, 660, 1331, 1851… would be approximately 2745, according to this data.

        Is that reasonably correct?

        • TheSFReader says:

          Don’t have excel on this PC (for the WE duration), but (60) + (100) is already around 2500 , and with the fall-off, 2745 looks all right.

  34. TheSFReader says:

    Data Guy, me again with a new observation, hope you won’t mind…

    Checking out the file, I’ve seen that only the books up to rank 100 000 (that is roughly 32000 books) are taken into account for the sales/earnings etc. results.

    This is due to the fact that you count only ebooks that sell at least once a day. This omits the long tail of books that sell less, among which for example you could find high cost non-fiction.

    Would the results change much with weekly/monthly estimations ? ie taking into account books that sell 1nce a week or once a month. One can simply multiply the Rank/Sales estimation by 7 or 30 and add steps past the 100 000 rank to further the precision, right ?

    • Data Guy says:

      Yeah, to make the spreadsheet simpler, we left out the roughly 13% of Amazon’s sales that live down in the deep long tail below rank 100,000. But we do account for them when scaling up our daily sample to estimate total daily or annual sales.

      The reason that we didn’t put them in the spreadsheet is we didn’t want to have to keep explaining to the less mathematically inclined folks how a book can sell a fraction of a copy in a day.

      Ranks 1 to 100,000 of the rank-to-sales curve add up to a total of 1,331,910 sales per day.
      Ranks 101,000 to 3 million+ add up to roughly 210,000 more sales per day.

      • TheSFReader says:

        But doesn’t it effect some results if the publication categorization differs from the “top ones” for these 13 % ?

        • Data Guy says:

          Good question.

          I redid the spreadsheet to check — the only noticeable effect was a three-quarter-percent gain in gross $ sales for Small/Medium Publishers… from the non-trade Textbook & Academic segment, mainly, made up of rarely-selling ebooks that cost hundreds of dollars each.

  35. Publishers Lunch says:


    Like you, were are concerned with the actual facts, and nothing but the facts. This assertion:

    The problem with the above “Nielsen numbers” is this:
    Amazon.com alone visibly sells over 560 million ebooks and 420 million print books a year in the U.S.

    is demonstrably, inarguably, factually incorrect. Or to use your phrase “wildly wrong.”

    Nielsen Bookscan captures every single print book sold by Amazon as part of its data (including self-published titles, such as CreateSpace books). Amazon’s print books are included in Nielsen’s count of 635 million books. I’m not at liberty to disclose in public Amazon’s share of those 635 million books, but it is a known number and you are saying Amazon “visibly sells” is *exponentially* wrong. So to people who understand the nuance of the various industry statistics — all incomplete in some fashion or another — that implies that if your methodology is so completely wrong on print books, that it’s wrong (or at the very least dubious) on everything else.

    • Nirmala says:

      Does Bookscan capture every third party and used copy sold on Amazon? If these are included in calculating rank, they would also be included in Data Guy’s figures.

    • Data Guy says:

      Thank you for joining the discussion, Publishers Lunch. Your input is valued.

      Nielsen Bookscan captures every single print book sold by Amazon as part of its data (including self-published titles, such as CreateSpace books).

      In the case of my own print books BookScan clearly doesn’t. My BookScan numbers reflect a fraction of the CreateSpace sales reflected in my CreateSpace dashboard and which I’ve been paid for, let alone my Ingram-distributed Lightning Source sales.

      In my case, BookScan captures roughly 48%. For others, mileage may vary.

      BookScan’s reported 635 million U.S. print books sold for 2014 means roughly $5-$7 billion in industry-wide US print revenue for ALL publishers. Does that even remotely make sense, given that the Big Five alone report $10 billion+ in annual revenues? The numbers don’t match.

      Please help us make sense of them.

      • Publishers Lunch says:

        I don’t know that you can use your personal CreateSpace dashboard to reflect on Bookscan sales *at Amazon.com.* Sales through Amazon’s regular US bookstore should be there in full. Sales through any other outlets may or may not, depending on the extent to which the selling outlets are in Bookscan’s network.

        Some measure of potential Createspace variance aside, I still don’t see how there can be any remote validity in finding Amazon’s print sales are more than double what their own systems report through Bookscan. Do you have other explanations for that?

        A few macro numbers for your other questions. Your number on the Big Five is way too high with respect to US sales.

        The closest thing to an industry number on US sales is the AAP’s data. The approximately 1,200 reporting publishers include all the largest trade publishers (of which the five largest are a component), and most/all of the biggest distribution operations. That was $6.44 billion for 2013. eBooks comprised $1.47 billion of that (23%). So roughly in line with your Bookscan supposition. Though it’s important to note that Bookscan does not capture any data about sale price in the US; just units. (In some other territories, they have price data as well, but not here.)

        Remember when looking at the big houses’ public reports that their topline is worldwide, not US only — and the proportion of US sales diverges widely across the group. (HBG US is a minority portion of Lagardere revenues; S&S has a modest UK division and gets most of their sales in the US; Penguin Random House is a global giant so their numbers include significant contributions from the UK, Germany, Spanish-language territories and more; the proportion of US sales at Harper has declined with the addition of Harlequin; and so on.)

        • Data Guy says:

          This is very helpful — thanks!

          Full disclosure:

          Our Amazon print-sales estimate of 420 million print books is a work in progress, which is why we haven’t published a report yet on print specifically (other than the Bookscan-based print-versus-ebook author-earnings comparison a while back.)

          We have yet to collate (crowd-source) actual data on average Amazon print sales to rank ratios, the way many authors (including us) have already done for Amazon ebook sales.

          So how did we come up with that 420 million number for Amazon print sales?

          Based on Amazon’s mixed-format bestseller lists, which rank print (and audio) books right alongside ebooks. The ebooks above and below each print book on those lists gives us an upper and lower bound on how many copies that print book is selling. An audiobook and a Kindle book selling the same number of units definitely hold equivalent rankings on those lists; I can always tell within 20% how many audiobooks I am selling on any given day by checking the salesrank of the Kindle books just above and below it.

          The assumption underlying our print sales estimate is thus that those mixed-format lists are ranked based on each entry’s unit sales regardless of format — something a data-crowdsourcing effort will be able to confirm or refute definitively.

          • Publishers Lunch says:

            Thanks for the additional context; also helpful!

            I wonder about the inverse question this raises. If your assumptions/methodology on print units is off by the order of magnitude that knowledge of Bookscan-reported units would imply, does that mean your ebook estimates are off by a similar order of magnitude?

          • Publishers Lunch says:

            One more thing to add for your own utility. eBook sales are not necessarily highly seasonal, since they are self-purchased. But we know print book sales are highly seasonal; again, there’s a Bookscan line graph that makes this very clear. As I understand you ebook sales rank distributions, it assumes any day is like every other day. That won’t work for print. (e.g. December 15 is vastly different in total sales than, say, March 15).

          • Data Guy says:

            Excellent observation about print seasonality… I’ll look for that BookScan curve you mention.

            It’ll be a huge factor in the print numbers (but far less so for online sales than brick & mortar bookstores, perhaps). I have noticed that the census.gov estimates for December bookstore sales vs November bookstore sales show a nearly 2X jump…

            Your skepticism about using Amazon sales ranks to calculate average daily ebook sales is understandable, but I think unwarranted given the high degree of visibility we have into near-real-time sales info and rankings as high-, medium-, and low-selling indies. Rank-to-sales estimates get pressure-checked daily by thousands of authors, including myself, as we watch our books move up and down the charts each day. I’d conservatively state our margin of error is “within 20%” on ebook numbers… but privately, I’d put my money on our being far closer than that.

            Thank you also for the invitation to discuss some of the data further off-record. I think we’ll find a lot of opportunity for productive and mutually-beneficial collaboration — both on the record and off — as we refine our various pictures of the industry.

      • “Nielsen Bookscan captures every single print book sold by Amazon as part of its data (including self-published titles, such as CreateSpace books).”

        That doesn’t appear to be the case with me.

        One of my books (ISBN: 978-1475212600) has sold 25,943 copies in all formats as of end of year 2014. The ISBN is only attached to the print version, and there were 1,229 print sales. Out of those 1,229 print sales, 1,024 for were sold by Amazon + its Extended Distribution network – a combo of stores like B&N, Powells, RJ Julia, The Book Depository etc. The remainder were distributed directly by me to bookstores in UK/Ireland, or were sold at conferences etc. Out of those 1,024 sales via Amazon + its Extended Distro network, 616 were sold on Amazon.com directly and thus should have been counted by BookScan

        Bookscan shows 354 print sales for that title, as of end of year 2014.

        So even if we just restrict it to print sales via Amazon US ONLY, BookScan is only capturing 57% of my sales – and that’s to say nothing of all those e-book sales that it has no idea about.

        It was obvious to me from the start that the BookScan numbers were way off because I was in the curious situation of not having a single sale recorded for that title in BookScan until January 2013. The paperback edition of that book came out at the end of April 2012. In that period where BookScan didn’t record a single sale, I actually sold 249 print copies via Amazon.com alone, plus another 58 copies via Expanded Distro.

        So, yeah. The same pattern is visible on all my print titles. In my case at least, BookScan numbers are totally unreliable.

        Note: it’s entirely possible that Amazon is giving incorrect BookScan information in Author Central, but I guess someone could verify BookScan-recorded sales on the above ISBN via alternative means and see if it matches up with my figures.

        • Publishers Lunch says:

          Most of what you are seeing is a function of the data Amazon is displaying rather than what is in the “core” Bookscan data. Bookscan source data starts showing sales for this ISBN in the week of 7/29/12.

          I’m not authorized to release the exact data shown in source Bookscan, but I can tell you it’s less than a 15 percent variance with what you say was sold via Amazom.com US. So pretty close. I would imagine the remainder is explained by either the missing few months in that data (I don’t know how far back the CreateSpace data capture goes); or how Amazon treats certain sales (e.g. CreateSpace estore may not count as Amazon sales), etc.

          I can see if anyone can shed further light on that, but the starting point still stands. Any methodology that derives print sales via spidering Amazon needs to come very close to deriving the known print unit sales as reflected via Bookscan to have any credibility.

          • SpringfieldMH says:

            A few of questions arise from this discussion.

            Is it for certain that Bookscan captures all Amazon print sales? Estimates or derived perhaps, but “all”? Which I naively take to mean pretty darn exact.
            The reason I ask is that I’ve seen recent complaints from critics of Amazon that Amazon is not being open with its numbers. Is Amazon releasing print sales numbers but not ebook and is that what their complaint is? Or are Bookscan’s Amazon numbers derived entirely from non-Amazon sources?

            I’m assuming Hugh and some other authors and author organizations can afford a subscription to Bookscan ($5,000/year for everything or $85 per ISBN number?).
            Would they be allowed to sign up?
            Would they then have access to the “source data” referred to here or is that some deeper level only “insiders” are privy to?
            Is the data in a form that they could verify, analyze as they see fit and use to check and callibrate their own work? Incorporate as data into their work?
            Would Bookscan terms allow them to report their calibration and conclusions?
            Their detailed analysis of Bookscan numbers and methods?

            One hopes that such discussions and access occur.

            The next Author Earnings report should prove even more interesting.


          • SpringfieldMH says:

            Just a followup… I did a bit of searching and, while I couldn’t come up with a clear answer, sounds like the amount of Bookscan access/data required for what I discussed is likely far more… in the tens or even hundreds of thousands of dollars.

          • Publishers Lunch says:

            Yes, Bookscan captures all Amazon print sales for certain, provided to them under a licensing arrangement with Amazon (as is the case for all of the other retailers they deal: BN, Wal-Mart, Costco, BAMM, etc.) Amazon as a company discloses very few of the metrics about their business, yes. And within Bookscan, Amazon’s data is aggregated along with other similar retailers. There is not an Amazon line item. (In DG’s case, it was a matter of outlets carrying his print POD title.)

            Authors are allowed to enroll for paid services from Bookscan in a variety of ways — and that paid license has terms that govern their use. My understanding is that private use would be unrestricted, but public use would not be permitted.

  36. RE: ISBNs, while you say they aren’t necessary, isn’t it true that many distributors require an ISBN to reach certain channels? And that many channels require an ISBN to be listed? I know that if you use Draft2Digital or Smashwords, several of their channels can’t be reached without ISBNs. I also believe anyone getting into OverDrive requires an ISBN, at least through Smashwords they do. Also a few of the international distributors and even countries require ISBNs. If a person wants to maximize their reach an ISBN is required. In light of this, I would say it’s foolish NOT to have an ISBN.

    Also, as to the comment about using a free CreateSpace ISBN—that might save you a few dollars, but if you sell more than 2 print books a month, you’re losing money. SO, if you’re a big seller, like some indies are, and you use CreateSpace’s Expanded Distribution, you are costing yourself a LOT of money.

    • If a person wants to maximize their reach an ISBN is required. In light of this, I would say it’s foolish NOT to have an ISBN.

      If maximizing reach is the objective, then it would be foolish.

      But if maximizing profit is the objective, a general case for foolishness can’t be supported.. An author who finds he makes more by exclusively distributing through Amazon Select does not need an ISBN. Since he can get an ISBN anytime he wants, there is nothing foolish about not having one.

      • Terrence, I agree if you want to be exclusive on Amazon, ISBNs aren’t required, but if you’re not exclusive, then maximizing reach is important.

        • Data Guy says:

          In the US, Amazon + Apple + Barnes & Noble + Kobo + most other ebook stores don’t require ISBNs. You don’t need an ISBN to reach the outlets where 95%+ of all US ebook sales happen.

          “Reach” can also be measured in terms of numbers of sales and readers, actual and potential, rather than how many (undertrafficked) outlets one’s books are theoretically available on.

          Indies are pretty business-savvy folks who vote with their wallets. The consensus (and data) shows that purchased ISBNs don’t bring in more sales or readers.

          I’m only bummed about them because they skew the industry’s stats and reporting so badly.

          • Data Guy: I realize you don’t need an ISBN if you go direct with all of those companies, but many authors use distributors to reach Apple, Kobo, OverDrive, Scribd, etc. and if you use the distributors, most of them require ISBNs for those channels. Certainly sales generated from Apple, Kobo and the others would more than compensate for the cost of an ISBN. (most distributors supply them, but if you wanted to buy one the cost would be covered) And does it really matter if an outlet is “Undertrafficked” as long as it’s covering the costs? Every sale is a sale.

            And while we’re speaking of business-savvy, you seem up on it, as I noticed you mentioned you were selling on LS, but many indie authors use CreateSpace’s free ISBN and stick with Expanded distribution in order to save a buck and end up costing themselves lots of money.

            Also, I had a question about how you calculated sales by ranking. If you don’t mind. What would be the approximate sales of books that ranked in these numbers?


            Thanks in advance,


          • Data Guy says:

            Yeah, I’m not endorsing or decrying ISBN usage. I bought a big batch of ’em when I started out, so the question is moot for my books. I’m just highlighting the reasons why so many indies don’t, and hoping the industry can do something to fix this (and the borked industry reporting that results from bad data.)

            To answer your question about average # of daily sales by rank:
            #61 = 1,600
            #110 = 950
            #275 = 375
            #340 = 266

  37. And does it really matter if an outlet is “Undertrafficked” as long as it’s covering the costs? Every sale is a sale.

    It matters. Each sale is indeed a sale. But an author also has to manage the resources deveoted to handling those outlets.

    An author who moves in and out of Amazon Select may encounter considerable difficulty in getting all those outlets to delist his books when he chooses. He may also encounter difficulty in getting all the outlets to increase prices in a timely manner in order to avoid Amazon price matching. This happens both when using an aggregator like Smashwords, or when going directly to the outlet.

    This can lead to a situation where the least profitable outlets take the most managment resources, and lead to losses on Amazon opportunities. An author has to determine at what point a low performing outlet is not worth the effort.

    This situation would be a functiuon of the number of books an author has, the number of outlets in use, and the sales at each outlet. There is no simple answer. Each author has to look at his own situation.

  38. DataGuy: have you taken into account the effect of promotions like Bookbub? I have run some numbers on my own and using the numbers you provided, the sales are way off. Example: These are my rankings for the past 7 days:


    According to the numbers you provided, and a few guesstimates by me on the last couple that should equate to about 3486 sales.

    The “actual” sales, however, were only 1908.

    That is a significant difference. If that is true, and we multiply that by the 40-50? Bookbub Promotions per day, and assuming that each promotion has a “ranking effect” below 5,000 for 7 days, that is about 300 numbers affected drastically.

    Would love to hear what you think.

    This would potentially have an even bigger effect on income because the higher the price, the faster the actual sales fall in relation to the ranking, producing larger false data ranks.

    • Data Guy says:

      …have you taken into account the effect of promotions like Bookbub?

      An excellent question. We have.

      There are two things to consider when looking at possible measurement errors and their impact:

      #1) Error Magnitude: What is the worst-case effect of the source of error upon what you are measuring?

      #2) Error Coherence: When you combine a large number of those measurements to get an average answer, do all the errors “pile up” or do they tend to “cancel each other out” instead?

      Let’s look at #1, Error Magnitude, first. We’ll examine that weeklong BookBub “ranking effect” using your numbers.

      A BookBub promo that sells 1,000 books on Day 1 gives a lingering ranking boost on Day 2 that is approximately equivalent to 500 same-day sales, to 250 sales on Day 3, 125 on Day 4, 62 on Day 6, 31 on Day 6, and 16 on Day 7. Add those post-Day-1 “ranking effects” up, divide by 6, and you get an average post-BookBub “boost” equivalent to 167 sales.

      Taking the absolute worst-case possible interpretation of your example above (which would be if BookBub’s “ranking effect” impacted all promoted books in the same direction, and was therefore additive), we would have
      167 sales x 300 books = 50,000 sales worth of error.

      In that absolute worst-case, BookBub’s “ranking effect” would have less than a 3% effect on any of our numbers, because Amazon sells over 1,542,000 books a day.

      But the true impact of the “ranking effect” is nowhere even close to that. The math doesn’t really work that way.

      Because of #2, Error Coherence. BookBub “ranking effect” errors aren’t additive. On average, they all cancel each other out. Here’s why.

      In the case of BookBub and similar promotions which result in rank “spikes”, a bunch of those books will indeed overreport sales in our snapshots as shown (because they are on the way down). But another bunch of them will underreport sales by far more (because it takes many hours for Amazon sales rank to catch up with big sales “spikes”). The last time I ran a BookBub, I had reached 1,200 paid daily Amazon sales by lunchtime, but my rank didn’t hit the Top-100 until late evening. An authorearnings.com snapshot taken on that day would have dramatically underreported my sales. While overreporting yours.

      Across hundreds of thousands of measured books, these types of errors “cancel each other out.” The mathematicians call the effect “The Law Of Large Numbers” which is an incredibly overblown way of describing it, I think.

      • DataGuy: Thanks for the response. One more thing.

        I don’t mean to take up time here, but I would like to have a clear understanding of this, as the numbers are important, and I think we all agree that to have the correct numbers is the only thing that matters. Let me try to restate it as I see it. I understand what you’re saying about the up and down effect, but there is a problem in my mind. You have one day “up” and six to seven days “down”.

        So instead of looking at what should happen. Let’s look at what did happen.

        Day 1 reached rank of 275 — sold 950 books. According to you that rank of 275 would have only registered as 375 books, so we are at plus 575 books. (in other words we now have 575 more sales than you counted).

        Day 2 reached rank of 61 — sold 480 books. According to you that rank of 61 should have registered as 1600 books, so we have a negative of 1120 books. (this puts us at a –545 books for the two days).

        Day 3 reached rank of 110 — sold 258 books. According to you that rank of 110 should have registered 950 books, so we have a negative of 692 books. (this puts us at a –1237 books for three days).

        Days 4–7 sold 220 books. According to you the sales counted would have been 561, so we have a negative of 341 books. (this puts us at a –1578 for the seven days).

        I’ve been wracking my brain but I can’t see another solution. No matter how many books are launched each day from BB or whatever promo there is, there will be one day of sales where sales outweigh the count of the ranking, and then there will be 6+ days (approximately) where sales will lag behind the count of the ranking.

        All of this is assuming that rankings are read at the same time per day, at the peak.

        • Data Guy says:

          No worries — it was a good question, worth exploring.

          As an interesting and somewhat illustrative coincidence, your -1578 sale over-reporting is almost exactly offset by the +1500 or so sale under-reporting on the day of my BookBub. 🙂

          But more seriously, I notice that you’ve also specified that your book “reached” the ranks you mention, implying that those were the maximum ranks attained during the day. Keep in mind that a snapshot taken at a random time would not catch all books at their peak ranks that day.

          If you’re taking snapshots at random times instead of comparing peak rank each day, on average, one day waaaay up but underreported, followed by six days of drifting down at a 50% daily decay rate and thus overreported, together work out to a mathematical wash.

          Also, don’t forget consideration #1 – Error Magnitude. We’re only focusing on consideration #2 – Error Coherence — here. But even if we were to ignore that effect completely and treat the errors as unrealistically perfectly synchronized (such that all “pile up” and none “cancel out”), the worst-case net effect would have less than a 3% impact on any of the overall numbers.

          IRL, it’s under 1% — well within our margin of error.

  39. Nirmala says:

    There is a critique of the Authorearnings methodology here that I would like to hear your perspective on:

    He suggests that including lots of sub-genre lists give an emphasis to indie books because most big publishers only shoot for getting on the top category list, i.e. Science Fiction. So a Big 5 book that is at 105 on the overall Science Fiction list but that does not appear on any sub-category list would not show up in your data even though it is outselling lots of books on the sub-category lists.

    If this is a problem, perhaps your spider could go further down the list on top category lists.

    Or perhaps, it is actually quite rare for a book that is selling well to only be categorized in the top category.


    • Nirmala says:

      I have a vague recollection that this was discussed once already on here, and that I might have even participated in that discussion. So if you know where it was discussed, you can just point me to the right comment thread 🙂

    • Data Guy says:

      I encourage anyone interested to go read the post; I don’t see much point, myself, because we found the exact opposite to be true in our data sets.

      The deeper we went into the sub-sub-sub-lists, the more traditionally-published books tended to dominate those sublists. Unsurprising, really, because the majority of the detailed sub-lists on Amazon, and thus most of the books on them, are nonfiction.

      • Your server failed and lost my first attempt at this so apologies if there ends up being two versions of this reply.

        As Nirmala has name-checked, anonymously but with the wrong gender, I will reply. It is disappointing that you are unwilling to address a clear design flaw in the Author Earnings report as it renders them statistically untrustworthy. As your methodology (why is there not a page on the website about the methodology?) is only counting best-sellers but burrowing down into all best-seller lists, then you are seriously under-reporting Big 5 sales in Science Fiction. There are no Amazon sub-sub-sub categories in science fiction so that defence is irrelevant. Big 5 publishers do not go out of their way to get into sub-categories in the way that indies do, leading to a Big 5 dominated Science Fiction category and non-Big 5 dominated sub-categories. In the extreme example of when your spider crawls by all books in the top level category of Science Fiction being Big 5 and no Big 5 dragged by keywords into sub-categories you would be comparing 100 Big 5 books to 1900 non-Big 5 books. To give a real world example today David Mitchell’s Hachette published Bone Clocks is #42 in Science Fiction #82 in Literary Fiction and not in any sub-categories. It is currently #1370 in the Kindle Store. When it goes #100+ in Science Fiction and Literary Fiction it will no longer be counted in your dataset, but you will be counting #95 in Science Fiction > Galactic Empire, which currently has a Kindle Store sales rank of #15,228 (Raymond Weil’s self-published Slaver Wars: First Strike).

        As you do not generally cite the figures for Science Fiction > Galactic Empire vs Science Fiction > Space Exploration there is no need to go into the sub-categories as everything there is already in the top level Science Fiction category and you have the ability to exclude results by a rank judged to mean less than one sale per day. On Mike Shatzkin’s blog you commented that authors need solid and actionable data, but by going into sub-categories you are ensuring that your dataset is neither solid nor actionable.

        • Data Guy says:

          Hi, Mercia,

          As mentioned, anyone who is interested is encouraged to go to your blog and read your post.
          And then check the bestseller lists and sublists.

          Your observations and examples provided here simply do not match verifiable reality.

          Your “real world example” of a Big Five book that is not listed in any subcategories, Hachette-published Bone Clocks, is in fact listed in all of the following subcategories:

          Books > Literature & Fiction > Genre Fiction > Metaphysical
          Books > Literature & Fiction > Literary
          Books > Science Fiction & Fantasy > Fantasy > Coming of Age
          Books > Science Fiction & Fantasy > Science Fiction
          Kindle Store > Kindle eBooks > Literature & Fiction > Genre Fiction > Metaphysical
          Kindle Store > Kindle eBooks > Literature & Fiction > Literary Fiction > Mystery, Thriller & Suspense
          Kindle Store > Kindle eBooks > Literature & Fiction > Literary Fiction > Psychological
          Kindle Store > Kindle eBooks > Science Fiction & Fantasy > Fantasy > Coming of Age
          Kindle Store > Kindle eBooks > Science Fiction & Fantasy > Fantasy > Metaphysical & Visionary
          Kindle Store > Kindle eBooks > Science Fiction & Fantasy > Science Fiction > Metaphysical & Visionary

          (Just check “Look for Similar Items by Category” near the bottom of the Bone Clocks product page to see all the subcategories that book is listed under. BTW, KDP dashboard limits most indies to selecting far fewer subcategories, giving the Big Five books listed under this many different categories a significant visibility advantage and making them more likely to appear in our data set, not less.)

          The Amazon Sci-Fi Top-100 are in fact dominated by indies to a far greater extent than the subcategories:

          It is only when we include the Science Fiction subcategories, that the Big Five regain a little ground:

          The reason we spider subcategories is that there are only 100 books on each list. To do a statistically meaningful analysis, the top 100 in a broad category aren’t sufficient: we need a larger data set.

          Your assertion that we choose not to limit ourselves to the Top-100 and instead include subcategories in the hopes of deliberately skewing the results in favor of indies is bizarre, given that it has the opposite effect. But anyone can verify the above by looking at the category lists for a few minutes and deciding for herself or himself.

        • TheSFReader says:

          Mercia, Data Guy has already answered but left one part untouched :
          ” you have the ability to exclude results by a rank judged to mean less than one sale per day”

          He answered my own comment to that effect there :


          “Ranks 1 to 100,000 of the rank-to-sales curve add up to a total of 1,331,910 sales per day.
          Ranks 101,000 to 3 million+ add up to roughly 210,000 more sales per day.
          I redid the spreadsheet to check — the only noticeable effect was a three-quarter-percent gain in gross $ sales for Small/Medium Publishers… from the non-trade Textbook & Academic segment, mainly, made up of rarely-selling ebooks that cost hundreds of dollars each.”

          • Liana,

            The note on methodology is focused on an indie internal issue of publishers worried that they might be put into single author publisher of small and medium publisher although they self-identify as indies and Data Guy has done that reclassification for them. A methodology page would be a page top link that could be clicked, so that anyone coming to read these reports can see the research methodology. That might sound an overly academic term to use, but this latest report decries inaccurate statistics and industry pundits ignoring the Author Earnings findings and a little academic rigour can go a long way. A methodology page would include the following:

            1. spider design – currently we have to read this in the reports themselves and how many people are going to realise they have to do that? Among those people will be journalists and pundits whom Author Earnings are hoping to influence.
            2. dataset design – the question I asked above as to why sub-categories are burrowed into despite the fact that those sub-categories are not used in the published research. That will look to a dispassionate observer as a way to get the numbers looking worst for the Big 5, especially any Big 5 connected pundits and journalists or academics with a tendency to favour Big 5 publishers in their interpretations. If there is a spider limitation that should be noted here and possibly also under the next heading.
            3. research limitations – this would include any technical limitations of the spider, Kindle Unlimited and Amazon promotion of Amazon Publishing (e.g., Kindle First), difficulties of extrapolating from Amazon.com to the wider industry (e.g., authors who use ISBNs for epubs, but not for mobis), ranking being more sticky at non-Amazon retailers, etc.

            Such a methodology page will help not hinder Author Earnings because statistical research that acknowledges its limitations is generally more acceptable than research that has exactly the same limitations but does not acknowledge them. At least that is what academics tell students when it is explained that their thesis must begin with a methodology page. In the context of this project, such a methodology page acts as a beta testing programme in that by being up-front about limitations it allows others to suggest solutions.

            This report begins by criticising inaccurate statistics and to convince those who need to be persuaded (rather than those who want to be persuaded) the inaccuracies in the Author Earnings dataset need to be acknowledged and hopefully over time resolved. I began writing about these reports because I followed the request at the end of this report to go check a few best-seller lists on Kobo. I cannot build a spider, so it would be a shame if design problems were to render these reports unusuable to anyone except those who were already convinced of the conclusions that indies are doing well before the report was written.

          • TheSFReader says:

            do you think a link “Methodology page” with the following content be appropriate ?

            “For you techies out there who geek out on methodology, the spider works like this: It crawls through all the categories, sub-categories, and sub-sub-categories listed on Amazon, starting from the very top and working its way down. It scans each product page and parses the text straight from the source html. Along with title, author, price, star-rating, and publisher information, the spider also grabs the book’s overall Amazon Kindle store sales ranking. This overall sales ranking is then used to slot each title into a single master list. Duplicate entries, from books appearing on multiple bestseller lists, get discarded.

            [O]ur spider is looking at a snapshot of sales rankings for one particular day […] Extrapolation is only useful for determining relative market share and theoretical earnings potential. Our conclusions assume that the proportion of self-published to traditionally published titles doesn’t change dramatically from day to day, and the similarity of [successive] datasets […] lends that assumption some support.

            […]The preponderance of nonfiction in [the] sample does not reflect market share. Rather, it reflects the many hundreds of detailed Amazon sub-sub-sub-category bestseller lists for non-fiction (Health, Fitness & Dieting > Alternative Medicine > Holistic, for example), that make lower-selling nonfiction more visible to the spider than equally low-selling fiction.”

            (from http://authorearnings.com/report/the-50k-report/)

  40. Nirmala says:

    Thanks again for your endless willingness to answer our questions!

  41. enabity says:

    There is a relatively easy method for filling in the top sellers on Amazon without having to run your spider through every title on the site. If you keep a list of the URLs of all of the top 20-500k or so sellers from the day before (an amount of data that would easily fit in memory on any computer) and move to a system of feeding that list with titles from the best seller list, then you could eventually close most of the holes in your sample.

    • TheSFReader says:

      Not that sure about one day to the next, but maybe re-inputting one author earning report’s books from one to the next one would work. I guess such a seeding wouldn’t be too difficult and maybe increase the coverage.

      • enabity says:

        I hope that there isn’t only one sample being taken per quarter, that there are periodic samples being taken over a period of time. Expecting that ever title is being sampled daily is probably unreasonable, but even weekly or monthly samples would be improved by retaining the old top books list.

        • TheSFReader says:

          Actually, AFAI understand the earnings Reports are “spidered” “once” per period (roughly 2-3 months), but that is no problem since it plays on statistics rather than specifics : looking at the “macro” level rather than “micro”. Conclusions are not taken “per book” but per “group/categories of books”. Drawing conclusions from one book sample in the excel file would have no meaning, but from a macro point of view, there is no real problem.

          For example the proportions of books from indie vs Big5 in the Top100 will most probably be representative.

          “One sample is not enough” was the first reaction when the first report was posted. However 1) from a statistical point of view, the methodology stands on its own, and 2) No later report (using the same methodology) has shown major variations, which again proves that’s not “one of a kind wonder”…

    • Data Guy says:

      Your suggestion has a lot of merit.

      A few things make doing so a little challenging: the once-per-quarter frequency of our data capture and the high turnover of the bestseller lists and sublists.
      (In our October report, we found that almost 80,000 of the 120,000 July bestsellers had since fallen off the lists to be replaced by 80,000 others.)

      We are getting a very comprehensive look at Amazon sales every time, though — the data “holes” are mostly down where titles are selling fewer than a handful of copies each. With each dataset, we’re capturing:

      – practically all of the top several hundred ranks
      – 95% of the top 1,000
      – 80% of the top 5,000
      – 68% of the top 10,000
      – 52% of the top 25,000
      – 42% of the top 50,000
      – 33% of the top 100,000
      – 11% of the top 1,000,000
      – some additional ones ranked in the 2,000,000-3,000,000 range (mostly from really specific nonfiction bestseller lists like “Renaissance Painter Biographies” or whatever.)

      Ideally, I’d like to grab all 3 million-ish every single day instead… 🙂

      But the comprehensiveness of our snapshots comes at a nontrivial technical cost. For the technically curious out there, the data collection for this last report used 40 enterprise-grade servers (with 8 high-speed CPUs each) to crawl Amazon’s best seller lists and product pages, sucking almost 600 Gigabytes of HTML webpages across the Internet and ripping their HTML apart to extract the information we need and store it into a MySQL database. Each run takes a few hours, after which we shut the servers down before they burn a hole in our bank accounts.

      Each report is thus a deep cross-sectional study of Amazon’s sales that day, but each is a single snapshot taken on a particular day. Their compositional consistency from quarter to quarter strongly suggests that we wouldn’t find much variation on the days in between, either. But perhaps we’ll try a longitudinal study in parallel at some point (or even better, someone else will) using a smaller set of titles.

      While Hugh and I both enjoy doing this and sharing what we learn, we would much rather spend more of our time writing our books…

      • Andrew says:

        Maybe Amazon Web Services could cut you a deal on data….


        Keep up the good work!

      • Ideally, I’d like to grab all 3 million-ish every single day instead…

        I’m reminded of the distributed systems SETI has where people donate their computer downtime to be used in a larger computing effort. A few other organizations have done the same. They get thousands of people to enroll their computers.

        The last one I was involved with incorporated the program into a screen saver, so each time the screen saver started, the computer was actually working as a node on a much larger project. The owner hits a key, screensaver disappears, and owner has full control.

  42. TheSFReader says:

    For whom it may interest, and with Data Guy’s authorization, I’ve posted on my blog a summary of various excerpts/comments regarding the Author Earnings Methodology


  43. Bartus Trust says:

    We wish to bring to your attention our
    recent experience with CreateSpace and Amazon. In July, 2015, we
    self-published a nonfiction book “Shaming Justice: The Arizona State Bar and
    Supreme Court” with the former, while the latter is distributing
    it by taking orders online. In our experience, these companies together
    are badly gouging authors. For a book listing at $20 we are receiving less
    than $8 per copy in royalties. We would anticipate a class-action lawsuit.
    Amazon is behaving in the typical fashion of American megacorporations,
    abusing the little people for its own profit, just because it can.

    Bartus Trust

  44. liz says:

    I am researching for indie writers that have posted testimonials about their earnings in the field of home publishing. Anyone please? I am gathering for a report to be published soon.

Leave a Reply

Get future updates

Author Survey

Add your data! No matter where you are in your publishing career, your data can help other writers better understand this rapidly changing market. Take this anonymous survey and view the results.

Take the Survey Results

Sign the Petition

Would you like to make your voice heard? Whether you are a reader, an aspiring writer, or a published author, sign here to allow us to advocate for you.

Sign the Petition