Cohort Analysis with Tableau
Top performing SaaS companies rely on cohort analysis in order to understand customer churn. This post describes a standard approach to cohort analysis that is available in our SaaS metrics solution. For more information on our SaaS metrics solution, please click here.
Technically, cohorts or groups can be represented any number of ways. However, a routine cohort in the SaaS or recurring subscription space is to group customers by start week/month. Grouping a customer by start period is a way to minimize or fix external variables as these customers come of age and incubate in a similar environment.
Tableau is one of the preferred BI tools in our SaaS metric solution. It provides excellent visualization capabilities for what is usually presented in the form of boring cross tab reports.
The data volume is manageable, the underlying data model is well modeled, and I need to explore multiple visualization techniques. This is squarely in the Tableau wheelhouse.
To start, I have been asked to address two specific stakeholder questions.
- How does cohort perform in the first 3 months?
- At what point does customer churn stabilize?
I set up connectivity to our Redshift demo DB and off we go. First, I add the fact table, a customer dimension and a time (month) dimension. The fact is aggregated to the month grain, and the time dimension will provide the start month.
Add a second time dimension and alias for current month, this will be used for our over time analysis.
Luckily our solution is well modeled, and aliasing out the time dimensions is all that is needed to slice and dice via start and other dates.
*Note: We will make extensive use of date fields in the rows, columns (dimensional analysis) of our cohort and crosstab analysis. You can calculate some of these dates in tableau with truncate functions etc., however it can be difficult working with these fields as dimensions. (i.e. cannot use a calculated field as a dimension). As with most BI tools there are probably ways to make it work, but as a best practice I would calculating these fields in a DB view or in Tableau’s dynamic SQL layer prior to building the analysis. Luckily Tableau makes it very easy to step back into the metadata layer and adjust this later if the columns aren’t cooperating, this is another of Tableau’s great strengths.
Now that our metadata layer is set, let’s begin building the analysis with this goal in mind.
I would eventually like to end up with a cross tab that tracks the percentage of customers that remain active each week group by start week.
|1st Subscription Week||2nd Subscription Week||3rd Subscription Week|
|Start Group 1||100%||95%||85%|
|Start Group 2||99%||94%||80%|
|Start Group 3||100%||96%||88%|
To Start, I’ll add the Start Year Month from the “start date” time alias to customers whose start date is in 2013:
Add the Month Year from the Active Month alias as a column and drop any metric on to make sure your time dimensions are working together, in this case we are testing Active Count.
Upon researching the topic I ran across a great resource:
This resource is especially useful if you need to build specific dates or metrics and provides context around how some of these groupings are applied.
The article is terrific in regards to grouping and set analysis but a bit light on the metric calculations, and no matter how many variations I tried I could get the calculation to behave the way I wanted.
I am unable to generate the % active value from each row via table calculations, the basic table calculations just don’t seem to allow for this. It does look like there is a way First(), Index() to loop through the first value of each line or potentially via advanced “compute using” parameters, but since I have the base elements of the calculation I am ready to move forward with those.
After some trial and error, I construct the following metrics:
Running Max Active: RUNNING_MAX(SUM([Activecount]))
Running Cancellations: RUNNING_SUM(SUM([Canceledcount]))
Running % Active: 100-((RUNNING_SUM(SUM([Canceledcount]))/RUNNING_MAX(SUM([Activecount])))*100)
Below is the % Active metric that I am after along with the individual components of the calculation so that we can make sure the metric is being calculated accurately.
*Note: Since I am filtering on the week start I don’t need to worry about new customers skewing my count. I grab the max number of customers for that period (which will always be the first week) and back out cancellations moving forward.
Next drop the debugging fields, format to a percentage value and adjust formula accordingly:
It needs help visually, it’s hard to discern, lots of numbers. This is where Tableau shines right?
Drag % Cohort Churn into the Color Object, Set to Red-Green Diverging and reverse the range:
Now we are entering Tableau country and have something that is a bit more visually appealing.
To distill this even further, a cohort heat map would look terrific on an executive dashboard. This view presents obvious and systematic churn problems in Q3 2013.
However, this isn’t what the was asked, remember our objectives:
- How does cohort perform in the first 3 months?
- When does churn stabilize?
It’s difficult to compare this data across active months while staggered over period so we will need to group by the number of months subscribed rather than over time.
Add the active count as the total will ensure the crosstab is grouping appropriately. (i.e. there are 2463 active accounts in month ‘0’)
Confirmed when the time dimension is removed:
Next as with many analytic tools tableau doesn’t like adding a measure as a dimension, you’ll need to convert the ‘Activemonths’ field as a dimension.
Can’t convert calculated fields, so again you may need a view or custom SQL with date difference or any other calculation. If I calculate within Tableau I won’t have the option Convert to Dimension seen below.
Drag and drop ‘Activemonths’ as a column and apply the calculations from the previous chart and we can begin to deliver the requested analysis.
Per our data set, I would present stakeholders both views (over time & per active month) and explain that the interruption in Q3 2013 is heavily skewing results. As you can see in the visualization above the periods of highest churn can be found anywhere from month 2 to month 7 and isn’t presenting a clear picture.
Moving forward I would suggest overlaying MRR data in a similar fashion and continuing to analyze the events in Q3 2013 to better understand and improve customer churn.
Tableau allows us to quickly swap visualizations, since this is over time it is relatively easy to present this in a line chart format. This format clearly illustrates that churn has dramatically improved across customers signing up in Q4 2013.
You can take this any number of ways, Tableau makes it easy to add trend lines, forecasts, projections and alternative metrics like MRR or LTV to understand more regarding the types of customer and revenue churning out.