The rating gap in the top AI assistant apps

Across the 6 AI assistant apps tracked here, the mean lifetime rating on the US App Store is 4.76 over 11,947,964 ratings. Among the people who reviewed most recently, several of these apps sit well below that headline. The distance between the two numbers is the subject of this brief.

Lifetime rating versus recent reviewers

The lifetime figure is population truth: it comes from Apple's full ratings histogram across every rating the app has received. The recent figure is a sample of the most recent reviews we captured, and it carries the bias described below. Developer-reply share is also from that sample.

App	Lifetime	Total ratings	Recent sample	n	Dev reply
ChatGPT	4.8	8,035,094	4.24	140	0%
Google Gemini	4.7	1,872,557	3.76	139	4%
Grok - AI Chat & Video	4.9	1,248,166	3.73	110	3%
Perplexity - AI Search & Chat	4.8	478,175	4.17	110	2%
Claude by Anthropic	4.7	176,925	2.99	140	0%
Meta AI	4.66	137,047	3.1	40	10%

Where recent reviewers diverge most

Claude by Anthropic: lifetime 4.7 to recent 2.99 (gap -1.71, n=140)
Meta AI: lifetime 4.66 to recent 3.1 (gap -1.56, n=40)
Grok - AI Chat & Video: lifetime 4.9 to recent 3.73 (gap -1.17, n=110)
Google Gemini: lifetime 4.7 to recent 3.76 (gap -0.94, n=139)
Perplexity - AI Search & Chat: lifetime 4.8 to recent 4.17 (gap -0.63, n=110)

What the gap does and does not mean

A low recent-reviewer average against a high lifetime average means satisfaction among people who reviewed recently runs below the headline score. By itself it does not establish that the app got worse. Recent reviewers self-select toward the dissatisfied, since a person who hits a bug is far more likely to leave a review than a contented one. The gap therefore blends genuine sentiment change with reviewer selection bias, and this data cannot cleanly separate the two. We keep the lifetime figure as population truth and the recent figure as a biased sample, and we never present one as the other.

Method

Population ratings and the per-star distribution come from Apple's ratings histogram, read from the latest snapshot we hold for each app. Recent averages and developer-reply share come from a captured sample of most-recent reviews, with the sample size shown in every row. No trend is inferred from review dates, because the captured feed is not a clean chronological series. The figures are arithmetic over public App Store data, and no model touches the review text.

Independent research from the Nativerse lab. Population data from Apple's public ratings histogram; recent sentiment from a captured review sample. Figures are cited, not invented.