Advanced SQL
Valuing User Subscriptions Over Time with SQL
We recently launched the instant-classic underwater warfare game Scope: Battle for the High Seas, and our investors are excited to see how the game’s subscriptions are growing. We are using a payment processor to handle collections of subscriptions and have access to a few tables. While this makes analyzing the data possible, it is initially not in the best format to do so. Let’s investigate how we can normalize our date ranges to compute the growth of subscriptions since our game launched.

Starting from the raw data
Our primary table is a subscription table that has the company id, plan id, and the start and end dates of each plan:

This data structure allows us to easily analyze some date slices of our data, for example we can look at the number of plans started on a given date:

But it doesn’t allow us to see inside of our start and end ranges, such as how many plans were active on a specific date. In order to do this, we must first get our data into a format friendly for aggregating across our ranges.
Getting the Relevant Dates
We first want to generate date series for each of our ranges, and we can reference these posts for best practices on doing so: Generate Series in Redshift and MySQL and Use generate_series to get continuous results
In our case we will opt for Redshift syntax and use a window function to get our date series. Since we only want a limited series of data, we can preform a check using a nested select to make sure that our dates only cover our periods with a plan.
with
dates as (
select
(
getdate()::date - row_number() over(order by true)
)::date as plan_date
from
subscriptions
)
, plan_dates as (
select
plan_date
from
dates
where
plan_date >= (
select
min(plan_start)
from
subscriptions
)
)
Accounting for Null Dates
In our subscriptions table we currently have the dates where they have started and ended. We like to pre-process our data to ensure data consistency when other analysts re-use our work. To ensure that active subscriptions are included in our analysis, we’ll want to give our end dates a value as well. We can accomplish this by using a coalesce statement to evaluate the plan_end value and inserting the current date in the event that it is null:
, cleaned_subs as (
select
company_id
, plan_id
, plan_start
, coalesce(plan_end, getdate()::date) as plan_end
from
subscriptions
)
Joining on Inequalities
Now that we have our cleaned subscription data in one temp table and our relevant dates in another, we can join them together. We typically join two tables by looking for equal attributes, but we are certainly not limited to the equality operator. For our analysis we want our result set to have a set of rows for every date that a plan was active.
To do this, we will left join plan_dates to cleaned_subs where the plan_date is greater than or equal to the plan_start and plan_date is less than or equal to the plan_end.
, joined_data as (
select
*
from
plan_dates
left join cleaned_subs on
plan_dates.plan_date >= cleaned_subs.plan_start
and plan_dates.plan_date <= cleaned_subs.plan_end
)
Now that we’ve got our data merged together, we are able to see a record in the result set tying every subscription id to every day it was active.
Aggregating Our Data
Now that our data is joined across date ranges, we can perform the next step in our analysis and aggregate the data! We will use the count function to determine the number of each type of plan that we have by day.
, aggregated_data as (
select
plan_date
, plan_id
, count(*) as active_subs
from
joined_data
group by
1
, 2
)
With that, we are able to see the growth of plans over time:

This is great information, but won’t tell the full story to our investors. What we really want to display is how Scope is growing and how its users are signing up and upgrading over time. To get this picture of our data, we can join our aggregated data with our plan data to understand the value within each plan:
select
plan_date
, plan_id
, plan
, (active_subs * amount) as mrr
from
aggregated_data
join plans on
aggregated_data.plan_id = plans.id
Now we are able to see that our users are in fact moving from our Starter plan to a Pro Plan over time and that the growth in revenue is promising!

Pushing joins beyond their normal use cases opens the door to analyze your data beyond its initial state. Let us know how else you take advantage of joins by tweeting us @Periscopedata!