7 Responses to “AuthorEarnings Methodology”

  1. Hi DG

    This is so interesting and impressive. It’s kind of astonishing (A) that no one has sought to figure this out before – I seriously doubt that any of the Big 5 have done so for example – and (B) that the salesrank formula is so sweetly simple and elegant.

    A question: the very elegance of your formula suggests that it’s basically The One. But that would say that if we had a month’s worth of daily sales for, let’s say, 1000 miscellaneous books and if we took salesrank scores at five minutes past midnight for each of those books, we should basically be able to rank those books pretty much perfectly – and if there were little anomolies those things would probably be down to things like data not flowing evenly through the system, so some salesranks were a little lagged compared with others.

    Do you think that’s correct? Or do you think that on top of a basically super-sweet n simple formula, Amazon has sprinkled a little garnish – to do with, say, price or performance across different formats or some kind of (#purchases/#impressions) metric?

    This is speculation, of course, but I bet you’ve got thoughts!


    • Data Guy says:

      Hi, Harry,

      Thanks for the kind words.

      I’m aware that at least one of the Big 5 is doing something similar to AE (but smaller-scale) for their own use, and coming up with quite similar results to what we do. But of course, as a commercial business they are doing it for their own internal competitive-analysis reasons, and thus have zero incentive to share any of that data with the publishing community or authors at large. 🙂

      I think the reason the Amazon sales-rank formula is so simple and elegant stems partially from necessity.
      Several times a day, Amazon must recalculate sales ranks for each of the hundreds of millions of items they sell — it’s not just books.
      An algorithm like this one makes that recomputation extremely simple and computationally efficient. For each item they sell, they only need to keep track of a single value: its current recency-weighted cumulative-sales total.

      Each time Amazon wants to update salesranks, they can then do so very efficiently:
      1) They simply downscale each item’s current cumulative-sales total based on how much time has elapsed since it was last updated.
      2) Then they add to that downscaled total any additional unit sales that have accrued since then.
      3) Finally they re-sort and re-rank all of these updated totals for each item category (ebooks, books, appliances, etc.)

      There are 2 reasons we won’t ever be able to predict a particular individual title’s sales rank dead-on, even if we have it’s exact daily sales and midnight rank:
      1) the recorded unit sales for a given day would be exactly the same regardless of whether most of those sales occurred right after midnight yesterday, or whether they occurred right before midnight today. However, the impact of those daily sales upon tomorrow’s sales-ranking in either case would differ by almost a factor of 2.
      2) sales rank for a given title is a relative/comparative measure which depends not just on the sales of that title, but also the sales of ALL OTHER titles as well. If we recomputed hourly, or at whatever intraday frequency Amazon does, we could theoretically match dead-on the recency-weighted cumulative sales numbers for every title we had sales data for. but that still wouldn’t give us a dead-on sales rank unless we had sales data for every title Amazon sells.

      But the good news is, when one is deriving unit sales from rankings for a whole bunch of titles at once — especially hundreds of thousands or millions of them, as AE does — the over/under errors on individual titles all statistically cancel each other out. The totals and averages you end up with are pretty much right on the money.

      As far as adding “garnish” to the rankings goes, Amazon doesn’t. Only unit sales (and KU downloads) factor into sales rank and into position on Amazon’s bestseller lists.

      However, the same is not true for how Amazon keyword search results are ranked, how titles are ranked in Amazon’s “featured” and “new & popular” lists, and the like. All of those lists factor in a whole bunch of different metrics like the one you mentioned (#-of-sales / #-of-impressions), as well as specific keyword-relevancy scores, and likely also weighting for publisher-paid advertising co-op, Amazon’s own promotional preferences, etc.


      • Harry says:

        Superhelpful. Thank you!

        • Data Guy says:

          …if we had a month’s worth of daily sales for, let’s say, 1000 miscellaneous books and if we took salesrank scores at five minutes past midnight for each of those books, we should basically be able to rank those books pretty much perfectly…

          I forgot to mention that your excellent suggestion above is in fact more or less exactly how we did it, back in January 2016, when we updated our Amazon rank-to-sales curve. 🙂

  2. Tmunot says:

    Great stuff, thank you Data Guy.

  3. Mony Kim says:

    This is super helpful. thank you for sharing DG!

  4. Daniel Kenney says:

    Hi Data Guy!

    Great podcast with Mark and James. I’ve been trying to figure out where I can find that most updated “curve” your referenced with the most up to date information about sales rank correspondence with actual numbers of daily sales.

    I was curious about seeing the most updated numbers and can’t figure it out.



Leave a Reply