Data Guy on the Author Earnings Methodology

A great exchange over at The Passive Voice. William Ockham, one of the great thinkers on all things publishing, put forward two of the criticisms he’s seen of the AE data. Data Guy chimes in with a third problem, and explains how all three criticisms do not alter the conclusions drawn from our reports. When people point to the vapid and non-existent but supposed “refutations” of our data, this might be a good link to rebut with.

I do apologize to those whom this information proves troubling, but it is a fair view of what is happening in the world of ebooks today. And all the trends we’ve seen point in the same direction.

I have seen a couple of interesting criticisms of the AE reports. The first is that Big 5 authors get substantially more in advances than the standard royalty rates would suggest (40% of gross was one figure mentioned). Assume for a minute that contention is true. Now take a look at the data from AE on the percentage of Big 5 earnings from titles originally published before 2011. Those are books that have definitely earned out. Create a model from the AE data that makes that 40% figure work. Publishing looks like a very strange business indeed. Spoiler alert! I suspect that that 40% of gross comes from looking at the first year income of a publishers’ titles. The Big 5 has always had what KKR calls the “produce model” of selling books.

The other criticism of the AE data is that the Big 5 is getting more than the Amazon retail price for many of their best sellers (i.e. Amazon is selling at a loss). This could skew the numbers somewhat. If this is indeed a big issue, there is a simple way for those publisher insiders (looking at you Jeremy Greenfield) to prove this. Replicate the AE approach, but pull the amount you believe the publisher is receiving. If that is beyond your technical ability, I would be happy to do it for you. For my standard overtime rates, $250/hour.

  • William, I’ll play devil’s advocate and add one more semi-”valid” criticism to the two you mention. :)

    While our sales to rank curve has proven to be highly accurate at predicting daily sales for ranks all the way up to the top 2 or 3 books, and even for those top 2 or 3 it will be quite accurate *on average*, the actual sales of the very top 2 or 3 books might vary significantly from day to day — from 5,000 to 10,000 or rarely even higher.

    But it’s important to keep in mind that Amazon sells more than 1,500,000 ebooks a day (just integrate the area under the rank-to-sales curve to calculate that). Even if we happen to capture our sample on the day when the movie version of The Fault In Our Stars or Allegiant hits the theaters and sales of that #1 book spike to something wild like 20,000 units, it would end up making less than a 1% difference in any of the AE pie charts.

    Let’s look at the other two criticisms you mention, which are the only other two I’ve found interesting as well.

    The observation that Amazon pays traditional publishers based on a wholesale reduction of list, but sells those books at a substantial discount from list, is a valid one… and in fact, we built that assumption into our spreadsheets and pie charts.

    We modeled Amazon’s effective retailer cut as only 20% for traditionally-published books. In reality, some books will be discounted deeper, and some will be sold at full list, but an average of 20% felt about right. But if you want to try out different Amazon retailer-cut percentages in our spreadsheet, you can — just change the number highlighted in yellow and the graphs will update. :)

    Just for fun, when Hugh & I were discussing what number we should use for Amazon’s effective retail cut on sales of traditionally-published books, I plugged in 0% to see what would happen. Even under such a extreme, non-credible reductio ad absurdum scenario (which would mean that Amazon is in total losing money across *all* sales of traditionally-published books), the Indie share of Amazon ebook author earnings was still 34% compared to the Big-5′s 41%.

    The other valid criticism you mentioned is based on the observation that some traditionally-published authors receive advances that don’t ever earn out, thus they are effectively receiving higher-than-25% net royalty rates and our spreadsheets and pie charts don’t capture that.

    While technically true, it’s also largely irrelevant, because those “extra” dollars — even if they add up to a significant total — only end up going to a tiny handful of authors at the very top of traditional-publishing’s pay scale. The payments these few megabestseller authors are receiving aren’t really “advances against royalties” in the true sense at all. These few authors have effectively negotiated the receipt of huge lump-sum payments for their books, instead of whatever nominal royalty rates are specified in their contracts (to avoid triggering escalator clauses).

    For the other 99.9% of traditionally-published authors, advances are no more than a loan made against their own future royalties. Thus those advances have zero net effect on our pie charts. Our Author Earnings charts say “Daily $ Revenue to Authors” because that’s what actually matters to 99.9% of traditionally-published authors, regardless of whether those daily earnings are still paying down the advance and bringing that first royalty check closer, or the advance has already been paid off and those earnings will be reflected on the author’s next royalty check directly.

29 responses to “Data Guy on the Author Earnings Methodology”

  1. […] via Data Guy on the Author Earnings Methodology | Hugh Howey. […]

  2. QUOTE: “Our Author Earnings charts say “Daily $ Revenue to Authors” because that’s what actually matters to 99.9% of traditionally-published authors END QUOTE

    With Amazon’s new pricing tool (in beta) Amazon is helping authors determine pricing strategies to make the most daily income. I’ve priced a few of my books at their higher suggested price to see if it pans out. As long as Amazon is implementing tools to help authors reach their readers, self publishers could gain market share.

  3. I would love more numbers. I would love if Amazon didn’t just show reviews, but number of sales. How many times have i been interested in a book that has very few reviews? lots. If I could see it sold a thousand copies and then see no bad reviews, I would be more likely to take a shot at it.
    I would also like Amazon to poll it’s own writers, find out how many people are earning over a certain amount a year writing for them. Is it a thousand, ten thousand? Doesn that mean Amazon self employs more than Walmart?
    There might be a hundred writers making a good living in the world of the big 5, and a few hundred more making an okay living, I would like to see Amazons numbers to see once and for all that self publishing supports more people.

  4. It may be obvious to most readers of this blog, but it might be helpful to also point out the fact that ebooks continue to earn royalties forever. So unlike in the past when a typical print book would sell for the first 3-6 months or so and then disappear from bookstore shelves, an book with an ebook version can take 5 to 10 or many more years to earn out, but it is fairly certain that it will eventually do so. This is in contrast to the past when it was widely reported that something like 7 out of 10 books did not earn out their advances which would alter the effective percentage those authors received.

    As William says, traditional publishers still think in a “produce model” of thinking that the only sales figures that matter are the first year or so when the book is “fresh”. In contrast ebooks are like the items at a hardware store. Even some little bin of obscure screws or bolts that only sell once in a while will eventually pay for itself.

    1. To be fair, there is a certain amount of truth to the view of most fiction being similar to a produce model. By far the most sales happen in the first few months of a book’s release. If you want confirmation, just look at the statistics from Michael J. Sullivan

      Notice how the sales fall off and then stabilize at a fairly low level. Now some books don’t see that tremendous drop because they start at a modest level and therefore don’t have far to fall. But then again, notice that “long tail” of sales that just keeps bumping right along selling week after week and month after month. Look at that as consistent, ongoing income like Dean Wesley Smith described in today’s Think Like a Publisher post about the return on investment of a book ( Notice the small figures of sales that he describes that just keep going and represent a continuing “dividend” on that investment.

  5. Man, I don’t understand this obsession you self-pub guys have with facts and whatnot. We all know Amazon is evil and are trying to kill literature and that the tsunami of crap will doom us all lest we stop it by denying all these losers and their lame “books” that people may or may not want to read.

    Facts schmacks. Get over yourselves!

    Trad publishing #1!

    1. Thanks Jack. I really needed that wake up call. I will immediately stop checking my ranks and sales on Amazon and go back to writing query letters. I was running low on scrap paper and those rejection letters really com in handy for that.

    2. I desperately needed that laugh. You don’t even know. Thank you so much.

  6. This blog post is most welcome, because the advances of the very few authors at the top of traditional publishing are a frequent argument to dismiss authors earnings conclusions. People like Mike Shatzkin who defend trad pub often use it. I guess that’s it’s a mean for them to reintroduce the Ponzi’s scheme of traditional publishing.

    1. A lot of Mike’s arguments are for the very few at the expense of the many.

    1. Really amazing and awesome to see PW covering this fairly.

  7. One other significant fact about advances,

    If an author doesn’t earn out their advance, they are far less likely to get another one (except for the mega-bestsellers who, as you say, are really just getting lump-sum payments rather than being paid royalties)

    so for anyone trying to make a career out of writing, failing to earn out the advances isn’t getting paid more, it’s the end of the road.

  8. Hey all,

    Thanks for the post. I’ve always felt Jeremy Greefield’s remarks were not unfounded criticism, and it’s good to see Data Guy answer them this way.

    The biggest interrogation that will always remain, though, is who Data Guy is, and how exactly you’ve been able to pull those numbers. I still feel like you ask people for a leap of faith and belief in this study from within the Amazon numbers.
    And personally, I’m ready to do this leap of faith and do believe that your numbers reflect reality in today’s author earnings. Because I believe in your cause. But I can also get that people who are persuaded of the contrary just won’t buy numbers where they cannot see solid, tangible proofs of validity.

    I’m not saying Data Guy should reveal himself, I’m just saying that it’s comprehensible if some people don’t believe in these reports. Again, I’m not one of these people. At all.

    1. There are quite a few people with the skills necessary to pull this data. If you have the patience and enough help, you can get the same data with a browser and a pencil. All of our calculations are in the spreadsheets, and the variables can be tweaked at will.

      I’ve seen two major sources of criticism, and both are companies that sell inferior data for a lot of money. It’s easy to understand why they are upset.

      1. Alright, you made me look closely at the excel sheet for the past hour :)

        I support the points made by DG in this article. I’ve tweaked the interpolation of daily sales according to book rank, and the percentage trad authors get from a sale, and it does not make a big difference, if one at all.

        I’ll have an email exchange with Jeremy Greenfield to understand what his criticism is about exactly. I get the point that companies selling data could be upset and fighting this with bullshit arguments, but I don’t think they’d fight it like that if they did not truly believe this data was wrong. Because let’s face it, they’re just making this data more popular. If I knew for a fact this data was right, and that bothered me, I’d certainly not be writing an article to the world proclaiming this data was wrong, I’d just try to never talk about it.

        So I guess I want to see for myself what the other side’s arguments are.

        1. Ricardo: His criticism is that we are giving away data that is better than anything any other industry analyst has provided, bar none. Our data is more eye-opening and useful than Nielsen’s data.

          The parties vociferously denouncing our work make money selling shoddy self-selected surveys. It’s really as simple as that.

    2. Data guy has explained how he gets the data from amazon. He’s explained it in more detail than most readers can even absorb. The trouble with this is that it is not possible to go beyond what he has already explained. The only thing more he could do would be to post large amounts of the computer code used in his crawler. Obviously very few people would understand the code.

      I don’t believe it is terribly complex, but it is definitely technical in a way that is beyond the knowledge of the vast majority of people. Those who do understand the code could verify that it does what he says it does. (I’m not suggesting he do this, by the way.) Or he could put in notes every few lines explaining what the following lines do. He could break that down into exactly how they do it, which would then require trying to educate every reader in the fundamentals of code logic and how they build on each other to make those lines of code work. We’re talking about at least a large textbook full of writing that would take many months to complete.

      This is not only unrealistic, it’s a complete waste of time. Who would read that? But my point is, apart from doing that, Data Guy has already done the only thing a technical guy can do in this situation by saying that it’s a crawler (or spider) of some kind, the data is publicly accessible, so it just goes onto the Amazon site and pulls the relevant information needed to make the calculations, which have also been described in detail.

      As I mentioned, I doubt if the software is very complex, because the logic required to do this is not very complex. Cross-referencing and compiling the collected data is somewhat more complex, but it isn’t genius-level stuff. I’m sure Data Guy would agree with me that thousands of developers out there could produce the same results if it was something they were interested in. Data Guy happens to be interested in it, so he wrote the code out of curiosity to see what he could find out. The results were so fascinating that he continued to refine and develop the code until it accounted for more and more possibilities. The current results are almost completely unassailable.

      The dirty little secret here is not that Data Guy might be an evil genius trying to trick the world, it’s that traditional publishing or any other interested party could easily contract a few developers to write something similar and view the results. The reason they don’t is because they KNOW the data is accurate. (Heck, they might have already done this privately, then buried the results.)

      Now don’t get me wrong, in the future they might hire some shady group of devs to purposely produce FAKE results, then put their propaganda machine behind promoting the fake results. They are absolutely not above that. In fact, I think I might go on record as predicting it right now. They really are irredeemable as a group. It’s not surprising, but it is disappointing that they are actually that bad.

      1. Data guy has explained how he gets the data from amazon… I doubt if the software is very complex, because the logic required to do this is not very complex… I’m sure Data Guy would agree with me…

        Agree 100%. People seem to get unnecessarily hung up on the technical method of data collection. But there’s nothing here that can’t be done with a notepad and a pencil. This ain’t exactly rocket science :)

        Edward W. Robertson has independently done some similar analysis in various genres without using any software code at all. His manual tallies of bestseller count by publisher type match the Author Earnings results quite well.


        You don’t need to trust some anonymous “Data Guy” or his software spider to see how ebook market share divides up.

        The answers are staring you in the face whenever you check the bestseller lists.

        1. Yeah, just had a look at the spreadsheets. It’s not that technical indeed. A statistician would not be a fan of your daily book sales interpolation method based on friend’s testimonials, but well, that’s another story.

          In any case, thanks for pointing this out to me and making me look closely at the numbers :)

          1. Heh :)

            Our sales-to-rank calculations are based on a little more than “friends testimonials,” though. Many dozens of authors contributed data and we found it all remarkably consistent.

            I think one of the unfortunate ironies of the Author Earnings effort is that anyone who already has a self-published book selling decently on Amazon can take one look at our sales-to-rank graph and immediately say, “Yep. Those are precisely the numbers I see with my own books.” But the folks who could benefit most from this information — the ones who don’t have any self-published books of their own to compare with — are the most likely to doubt it :)

  9. Dear Hugh and Data Guy,

    Hi – just so you know where I am coming from. I’m a hybrid author who isn’t doing as well at Amazon with my backlist as I wish – and am all for hybrid/indie authors. So understand this note is not a challenge or argument but I’m bad at math and mathematical concepts so I might be totally off on this, but I’m confused about the report.

    Branded top authors such as Lee Child, James Patterson, Nora Roberts, Steve Berry, James Rollins and on and on report that Amazon accounts for 15% or less of their sales since they are in every store from airports to supermarkets.

    So how do you factor that into your reports? Or don’t you? And does it matter – or doesn’t it?

    Since indie books are not for sale in any of those airports to supermarket places and Amazon accounts for 60%-100% of their sales and since Amazon supports and gives indies pushes -doesn’t it makes sense that indie authors are getting 1/3 or more of the pie chart?

    How can we look at the charts and make sense of them in terms of the millions of books selling outside of Amazon.

    1. Hi MJ,

      We’ve been working to get a handle on the print side, too — by looking at BookScan as a source. (Nielsen BookScan is only 65%-70% accurate, but it’s better than nothing).

      Sales for a small tier of mega-bestsellers like Patterson, King, Evanovich, Roberts, etc. skew toward brick & mortar print and away from ebooks and online because of the broad brick-and-mortar visibility you mention in airports, supermarkets, etc., and especially because of paid co-op placement in bookstores, which they benefit from disproportionately (Because publishers concentrate marketing spend disproportionately in their biggest-name tentpole authors).

      The vast majority of traditionally published and hybrid fiction authors see a very different mix — one that skews much more heavily toward ebooks and online print sales.

      Traditionally-published mid-list authors we’ve spoken with report that, according to their royalty statements, 60%-65% of their sales are coming from ebooks now.

      Right now, brick-and-mortar print sales make up no more than 33% – 36% of all trade publishing unit sales, averaged across all genres. Online print and audiobook sales (at Amazon and elsewhere) make up another 14% – 17% of trade publishing’s unit sales, while the remaining 45% – 52% are ebooks.

      For genres like non-fiction, children’s, and literary the brick-and-mortar print % will be higher, but for romance, sci-fi/fantasy, etc. less than a third(!) of sales are currently coming from brick-and-mortar print.

      So I guess my takeaway would be to evaluate each publishing opportunity based on who our peers are in usual sales numbers, as well as genre. If I’m Grisham or Evanovich, and my publisher will guarantee front-rack airport and supermarket placement, buying giant wall-of-my-books displays at the front of bookstores, and a huge dedicated marketing spend, then brick & mortar print will be major factor in my earnings. Otherwise, there probably ain’t much there there, compared to the loss of royalties on the e-book side.

      As always, YMMV :)

    2. Amazon is said to account for 50% of print book sales. And 60% of ebook sales. So I don’t get the 15% claims. I think they don’t understand how to read their royalty reports, and who can blame them?

      1. Hugh – were you responding to me? I’ve seen several royalty reports at that level and while Amazon may sell 50% of all print books sold but on those A + branded level authors, Amazon was definitely not responsible for 50% of those author’s sales.

        1. Hachette Author Avatar
          Hachette Author

          M. J. Rose,

          This explains very nicely why those “A+ branded level authors” see a very different proportion of print sales than the rest of us:

        2. Douglas Preston said in an interview this week that Amazon accounts for 40% of his book sales.

  10. Hachette Author Avatar
    Hachette Author

    From the above-referenced blog post:

    Now, the biggest bestsellers in the industry — say, James Patterson, or Doug Preston, or Richard Russo, or Scott Turow — sell the majority of their books in paper. After all, they’ve won the distribution lottery and their books are available in every airport kiosk, Wal-Mart, drugstore, and supermarket across the land. So their interest in retarding the growth of digital — where the same distribution is available to everyone — and in preserving the position of paper is identical to that of their publishers. It stands to reason they would fight to maintain the system that has made them so rich. But if you’re a legacy-published author whose sales are increasingly digital, you need to understand that the legacy strategy of pricing ebooks high is costing you money. Is that really something you want to help perpetuate? Yes, it works for James Patterson, but what is it costing you?

Leave a Reply

Your email address will not be published. Required fields are marked *