Research, AI systems

The rating gap in the top AI assistant apps

Every leading AI assistant shows a near-perfect lifetime App Store rating. Among recent reviewers, several sit more than a star lower. Here is the cited population data, and what the gap can and cannot tell you.

Across the 6 AI assistant apps tracked here, the mean lifetime rating on the US App Store is 4.76 over 11,947,964 ratings. Among the people who reviewed most recently, several of these apps sit well below that headline. The distance between the two numbers is the subject of this brief.

Lifetime rating versus recent reviewers

The lifetime figure is population truth: it comes from Apple's full ratings histogram across every rating the app has received. The recent figure is a sample of the most recent reviews we captured, and it carries the bias described below. Developer-reply share is also from that sample.

AppLifetimeTotal ratingsRecent samplenDev reply
ChatGPT4.88,035,0944.241400%
Google Gemini4.71,872,5573.761394%
Grok - AI Chat & Video4.91,248,1663.731103%
Perplexity - AI Search & Chat4.8478,1754.171102%
Claude by Anthropic4.7176,9252.991400%
Meta AI4.66137,0473.14010%

Where recent reviewers diverge most

  • Claude by Anthropic: lifetime 4.7 to recent 2.99 (gap -1.71, n=140)
  • Meta AI: lifetime 4.66 to recent 3.1 (gap -1.56, n=40)
  • Grok - AI Chat & Video: lifetime 4.9 to recent 3.73 (gap -1.17, n=110)
  • Google Gemini: lifetime 4.7 to recent 3.76 (gap -0.94, n=139)
  • Perplexity - AI Search & Chat: lifetime 4.8 to recent 4.17 (gap -0.63, n=110)

What the gap does and does not mean

A low recent-reviewer average against a high lifetime average means satisfaction among people who reviewed recently runs below the headline score. By itself it does not establish that the app got worse. Recent reviewers self-select toward the dissatisfied, since a person who hits a bug is far more likely to leave a review than a contented one. The gap therefore blends genuine sentiment change with reviewer selection bias, and this data cannot cleanly separate the two. We keep the lifetime figure as population truth and the recent figure as a biased sample, and we never present one as the other.

Method

Population ratings and the per-star distribution come from Apple's ratings histogram, read from the latest snapshot we hold for each app. Recent averages and developer-reply share come from a captured sample of most-recent reviews, with the sample size shown in every row. No trend is inferred from review dates, because the captured feed is not a clean chronological series. The figures are arithmetic over public App Store data, and no model touches the review text.

Independent research from the Nativerse lab. Population data from Apple's public ratings histogram; recent sentiment from a captured review sample. Figures are cited, not invented.