No items found.

# How To Calculate Cohort Retention in SQL

‍Losing users sucks. Losing customers really sucks. If you’re a startup you know that Retention is King. You should always be measuring and improving your user retention, so you can keep more users over time. In this post we’ll show you how to calculate user retention on your own data in SQL.

### Defining retention

If Gloria used the product on Monday, and used the product again on Tuesday, she is a retained user. If Bill used the product on Monday, but not on Tuesday, he is a lapsed user. Retention for Monday is the number of retained users divided by the number of total users. If Gloria and Bill were the only two users on Monday, then retention for Monday is 50%.

### Calculating basic user retention

The key to calculating retention is counting users who were active at time #1, then counting how many were active at time #2. An easy way to do this in SQL is to left join your user activity table to itself like so:

select *
from activity
left join activity as future_activity on
activity.user_id = future_activity.user_id
and activity.date = future_activity.date - interval '1 day'

Now, for every row of user activity, we have — in that same row — their activity 1 day in the future. This gives us an ideal table for calculating retention with some simple counts:

select
activity.date,
count(distinct activity.user_id) as active_users,
count(distinct future_activity.user_id) as retained_users,
count(distinct future_activity.user_id) /
count(distinct activity.user_id)::float as retention
from activity
left join activity as future_activity on
activity.user_id = future_activity.user_id
and activity.date = future_activity.date - interval '1 day'
group by 1

In our data analysis tool we get this chart:

For extra credit, change the 1-day retention to 7-day or 30-day to capture a sense of longer-term user engagement.

### Calculating retention of new vs. existing users

Often retention is quite different for users who just signed up as compared to loyal long-term users. To calculate new-user retention, simply join in your users table and only look at activity rows that occurred on the user’s join date:

select
users.date as date,
count(distinct activity.user_id) as new_users,
count(distinct future_activity.user_id) as retained_users,
count(distinct future_activity.user_id) /
count(distinct activity.user_id)::float as retention
from activity
-- Limits activity to activity from new users
activity.user_id = users.id
and users.date = activity.date
left join activity as future_activity on
activity.user_id = future_activity.user_id
and activity.date = future_activity.date - interval '1 day'
group by 1

We see that while overall retention is 46%, new user retention is only 5.8%! Now we see why it’s so useful to split out new users. Improving new-user retention should clearly be a priority.

To look at returning-user retention, simply change:

users.date = activity.date

to:

users.date != activity.date

This effectively excludes activity from users who joined that day. Our query now looks like:

select
activity.date as date,
count(distinct activity.user_id) as new_users,
count(distinct future_activity.user_id) as retained_users,
count(distinct future_activity.user_id) /
count(distinct activity.user_id)::float as retention
from activity
-- Limits activity to activity from existing users
activity.user_id = users.id
and users.date != activity.date
left join activity as future_activity on
activity.user_id = future_activity.user_id
and activity.date = future_activity.date - interval '1 day'
group by 1

As expected, existing-user retention is higher than the overall average: 66% vs. 46%.

### Calculating retention in cohorts

It can be very helpful to compare the retention of users who joined in week A with those who joined in week B. This lets us see if our product changes are improving our retention rate. Ideally we’d end up with a chart like this:

We’ll start by defining a few handy subqueries to simplify the problem. new_user_activity restricts user activity to new users:

with new_user_activity as (
select activity.* from activity
users.id = activity.user_id
and users.date = activity.date
)

cohort_active_user_count calculates the total number of active users — the denominator in our retention calculation — in each daily cohort:

, cohort_active_user_count as (
select
date, count(distinct user_id) as count
from new_user_activity
group by 1
)

On top of that, we’ll make a few smaller changes to the main query:

Calculate the retention period — the number days retained — as  future_activity.date - new_user_activity.date and group by it.With this grouping, we lose the simple count of active users in the cohort. Fortunately we thought of this and made our cohort_active_user_count subquery, which we can join in and use as the denominator.Finally, we’ll wrap the query in an outer query that makes nice-looking output, excludes bogus rows, and sorts.

select date, 'Day '|| to_char(period, 'DD') as period,
new_users, retained_users, retention
from (
select
new_user_activity.date as date,
(future_activity.date
- new_user_activity.date) as period,
max(cohort_size.count) as new_users, -- all equal in group
count(distinct future_activity.user_id) as retained_users,
count(distinct future_activity.user_id) /
max(cohort_size.count)::float as retention
from new_user_activity
left join activity as future_activity on
new_user_activity.user_id = future_activity.user_id
and new_user_activity.date < future_activity.date
and (new_user_activity.date + interval '10 days')
>= future_activity.date
left join cohort_active_user_count as cohort_size on
new_user_activity.date = cohort_size.date
group by 1, 2) t
where period is not null
order by date, period

Notice also the range join, one of our favorite SQL tricks, to get multiple days of retention in one chart.

This results in the table:

With Periscope, we can automatically pivot the result then color by percentile:

### Making retention more specific

Remember how different new-user and existing-user retention were? You can see the same variations across lots of user segments. It’s worth breaking out retention by:

DemographicsUser acquision channelPaying vs. non-paying usersDifferent kinds of activity — viewing, creating, buying, etc.

If you sign up for a free trial of Periscope, you can try this whole process for yourself! Periscope is a tool for easily creating charts and dashboards from your SQL database. You can use it to calculate lots of metrics like retention, and share them with your teammates, customers, developers and more!