After some oblique references via tweet, Microsoft immediately responded to our request for comments on our recent Xbox data analytics project. That response led us to make the following corrections and clarifications to our piece.
Microsoft has given us reason to believe that the usage data provided by the Xbox API consists of incomplete estimates of total Xbox Live usage and does not provide a complete record of recent usage sessions by the sampled gamertags. While the data provided seemed reliable in our spot tests, Microsoft tells us that the API was “intended to give every Xbox gamer a approach of time spent in a game so they have the opportunity to compare it to other gamers on the service.”
That error in the underlying data has led us to massively underestimate total usage times for the apps and games in our usage example. Graphs and charts referring to the average number of minutes played or the percentage of users who played a game during that 4.5 month period appear to be an order of magnitude lower than the actual usage percentages and times per app. The “My games and apps” section of the Xbox One was used by 71 percent of Xbox One players in our sample, according to Microsoft, not the roughly 6.3 percent our data shows.
“Specifically, from our full review of Xbox Live usage data, we know that players are highly engaged with backward-compatible game titles,” Microsoft said by way of additional example. “That’s why we continue to support this much-loved feature and the games that use it.”
It is still unclear to us if and how the “approaches to use” provided by the API affect the family member usage rankings in our example. If all apps and games at the same speeds (or even about the same speeds) were undervalued by the API “approaches”, these relative rankings would still be largely valid for a random sample. That would apply to numbers and charts describing different apps or categories (like Netflix or backwards compatibility) like: percentages of the total sampled playing time.
(Multiplying our raw reported use of backwards compatibility by an “underreporting” factor of about 11 – to match the relative gap shown in the “My Games & Apps” example above – the reported numbers of Microsoft and our sampled numbers at that point score much more in line…)
Microsoft’s response to us indicates that the API usage data allowed users to “compare”. [their usage] with other gamers on the service”, suggesting to us that this kind of relative comparison of samples may still be reliable and worthwhile. But Microsoft has not provided us with enough information to answer this question of relative comparison reliability to our satisfaction, simply by it considers the current analysis “grossly inaccurate and misleading due to an incomplete set of data and drawing conclusions about actual use from data that approximates use.”
Microsoft also cited “the sample size” and “the data of many users who choose not to share this type of data with other users” to question the reliability of our usage data. While we addressed these potential skews in our original piece, we don’t believe they are sufficient to explain the magnitude of the usage data errors described above, without other underlying API usage data issues.
What we don’t correct
Microsoft has given us no appreciable reason to doubt the fundamental reliability of the separate “property” example in our report, which measures which games appear on Xbox.com’s public performance lists (even for games where the player has no achievements). ). However, the company points out that “the data source used in that section returns data” about user profiles that were active when a game was played; it gives no information about the physical or digital ownership of the game.”
We also warned about this in our original piece, noting that “owners” were used as an abbreviation for all sampled users who had the game released on their Xbox Live account. We also noted in our Restrictions section that “proprietary data” cannot account for the distribution of used discs or single copies of a game used by multiple gamertags, and therefore cannot be used to extrapolate reliable sales data. That said, in the future we will refer to the data generated by this source as a “player” report rather than an “ownership” report.
Microsoft has been very kind in working with Ars to try to correct and clarify the sources of errors in our data. We’ve tried to be as open as possible about our data collection and interpretation methods, and we regret any misconceptions created by errors in our underlying data or our analysis.
“We recognize the work it took to put your story together,” a Microsoft spokesperson told Ars. “It’s a challenge to collect the things you can collect from the outside without direct access to proprietary tools and data, and turn it into a 10,000+ word story. We appreciate that effort… We appreciate the work and Ars Technica’s effort to share more information about the Xbox community, and we’re constantly looking for ways to do this that also protect the interests of gamers and our partners.”