Learn how to predict customer churn by building a predictive model

App Development

Learn how to predict customer churn by building a predictive model

This post will provide a step-by-step guide on churn prediction using Machine Learning

In this tutorial we share how to do churn prediction using Machine Learning. You’ll will be able to create a predictive model that will help your team to predict and identify customer turnover.

churn prediction machine learning
Source: https://www.displayr.com/

Customer attrition, turnover, defection, or churn, is the loss of clients or customers. As you can imagine, it is a critical  metric for companies like SaaS businesses that base their value proposition on a subscription-based model. In this post we’ll explain why it’s an advantage to keep track and predict customer turnover. Also, we’ll share with you the steps needed to develop a predictive model. This tool will help you identify if a customer will churn based on the data.

What is customer churn?

Customer churn is one of the most important business metrics. That’s because the cost of retaining an existing customer is significantly less than the cost of acquiring a new one. The latter is often referred to as Customer Acquisition Cost (CAC). Companies use it as a metric to track if a company has a viable business model that can keep profits generating while maintaining a low CAC . 

Historically, big companies like telephone services, internet providers, insurance firms and others performed customer attrition analysis. Nowadays it is often used by SaaS businesses and those adopting a subscription based model. According to Profitwell analysis, the average monthly revenue churn rate could be anywhere from 1% to 17%. In addition, most studies report that the median monthly churn rate in the 5-10% range.

What is the Average Churn Rate for SaaS?

The analysis and predictive analysis of customer attrition for SaaS companies is extremely important! Mainly because monthly recurring revenue is the main source of return. It is crucial to track recurring profit lost by churn, customer acquisition cost and customer lifetime value, which all define how valuable a customer is.

An important benchmark for SaaS businesses is the Mythical 5% which states that an acceptable churn rate is in the 5% – 7% range annually. Simple math supports the logic behind this statement. For instance, in the case of a SaaS business with 1000 customers, the 5% annual churn would be a total loss of 50 customers. In contrast, a 5% monthly churn would incur a loss of 460 customers annually, almost half of the customer base!

Telecom providers, financial services and insurance firms often have customer service teams to winning back defecting clients. That’s because recovering long-term customers can be more valuable to a company than newly recruited clients.

Voluntary vs involuntary turnover

Companies usually make a distinction between voluntary churn and involuntary churn. Voluntary churn occurs due to a decision by the customer to switch to another company or service provider. On the other hand, involuntary churn occurs due to extenuating circumstances, such as a customer’s relocation to long-term care, death or relocation to a distant location. In most applications, people exclude involuntary churn from analytical models. 

Analysts tend to concentrate on voluntary churn, because it typically occurs due to factors companies can control, such as how billing interactions are handled or how after-sales support is provided.

Customer churn prediction using machine learning

Predictive analytics use churn prediction models to forecast customer churn by assessing their propensity of risk to churn. Since these models generate a small prioritized list of potential defectors, they are effective at focusing customer retention programs on the customer base that is most vulnerable to churn.

In the following sections I’ll lead you through a step by step creation of a predictive model that will help your team identify customer turnover rates.

How to get your churn prediction using Machine Learning

Setting the Environment: churn prediction with Kaggle

For this post we prepared an example available on Kaggle. Kaggle is an open data-science platform using an environment called Jupyter. Using this environment, data scientists can collaborate, inspect and transform the data, produce visualization and execute experiments. What you see on the page is often referred to as Jupyter Notebook or just Notebook and it’s a common data science environment. Here we can explore the data, execute code from different languages like Scala, javascript and R. In our case, we are going to use python and plot graphs to run our experiments.

A jupyter notebook consists of cells of code that we can run just by selecting the cell and pressing the run button or by using ctrl+enter.

churn prediction machine learning

Exploratory Data Analysis (EDA) of the example data

As an example for this post, the dataset used for this analysis consists of customer data from a financial service institution. This data is anonymous and it’s public in the Kaggle platform. It consists of 14 columns and 10000 rows. 

We produced some initial insights trying to identify the cardinality of the data:

  • high cardinality columns are columns with values that are very uncommon or unique (surname, balance).
  • low cardinality columns have very few unique values and their values are typically status flags, boolean or major classifications such as gender.

The departing customers have already been  identified by the owner of the dataset and flagged as 1 in the column Exited. If you have not already identified the clients who have historically churned you should do that before you continue.

Using this column we can plot a pie-chart to better illustrate the level of customer attrition in the data.

churn prediction machine learning

EDA: Distribution analysis of Churn and Non-Churn customers

As part of the Exploratory Data Analysis process, in the following step we analyzed the distributions of other variables for the Leaving (Churn) and Remaining (Non Churn) customers. This is extremely useful! It provides an insight into the data, identifying if the data contains outliers or if the dataset is unbalanced. We can now start to formulate hypotheses. Categorical data, such as gender or nationality, appears in a pie chart. On the other hand, numerical data such as credit score or balance is shown as a bar chart.

churn prediction machine learning churn prediction machine learning

EDA: Identifying interactions using a Correlation Matrix

A correlation matrix is used to visualize the correlations between each column in the dataset.

As we can see from the first row, the Exited column has a positive correlation with Age and Balance, and a negative correlation with IsActiveMember and NumberOfProducts.

This analysis helps us formulate a hypothesis concerning  which are the most important features for our problem.

churn prediction machine learning

EDA: Principal Component Analysis (PCA)

PCA is an unsupervised learning technique  for identifying patterns and clusters, and reducing the dimensionality of a dataset. 

Considering that our dataset has 14 columns of which 3 are for identification (RowNumber, CustomerId, Surname) and 1 is the value we want to predict (Exited) we are left with 9 columns. 

Visualizing something using 9 dimensions is quite difficult. Therefore, we can simplify this complexity by reducing the dimensionality of the dataset using PCA.

From the visualizations we can understand that there’s not a clear linear separation between the Churn and Non-Churn customers. This allows us to exclude linear classifiers from our experiment.

churn prediction machine learning

Prepare the dataset 

In this phase we will prepare our dataset for training. Generally speaking in each ML or DL project we need to select relevant features. Therefore, we can create a training set and an evaluation set that will be used by our model to learn and understand patterns from the data. Here we will select the columns needed for the training process and split our dataset into 2 sets, a training set and a testing set.

The training data will be used by the model to fit the parameters during the training (learning) process. The test data will be used to evaluate the performance of the model on unknown data.

Customer churn model training and evaluation

For this problem we will use a gradient boosting technique called xgboost.

XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable. It implements machine learning algorithms under the Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as GBDT, GBM) that solves many data science problems quickly and accurately. The same code can run on distributed environments such as Hadoop, SGE, MPI and can solve problems that contain billions of examples

In this section we define the model hyperparameters and convert the datasets to use with xgboost.

We run the training process for 60 epochs, and evaluate against the testing set using the following metrics:

Precision

This metric evaluates how precise a model is in predicting positive labels. It answers the question, out of the number of times a model predicted positive, how often was it correct? 

churn prediction machine learning

Recall

Often called sensitivity, the recall calculates the percentage of actual positives a model correctly identified (True Positive).

churn prediction machine learning

Accuracy

Accuracy is an evaluation metric that allows you to measure the total number of predictions a model gets right. 

churn prediction machine learning

Good job! Our initial model has an accuracy score of 87.30%, precision of 83.01% and a sensitivity of 73.25%. Great news!

Feature importance

One of the characteristics of Xgboost is the ability to understand what are the most important features in the dataset.

We can do this by plotting the feature importance plot. That way, we can verify if our hypothesis from the data analysis section is correct.

We predicted that Age, Balance, IsActiveMember and NumberOfProducts were the most important columns and we were right for Balance and Age and partially right on NumOfProducts but we did not see any evidence of the importance of CreditScore or Tenure. 

Using these findings we can iterate over the data preparation / training / evaluation to optimize the performance of our model. In this case we’re working with a small amount of data. However, in a production environment we could have hundreds of columns describing a client. Therefore, identifying which columns are those describing the problem will be key for the development of a reliable and accurate model.

Production

Now that we have our predictive model and can successfully identify if a customer will churn, the next step is to run it in production.

In order to do this we’re going to run the model on a daily schedule using the latest available data. Remember to apply the same transformations used to generate the training dataset. 

If we analyze the prediction result from  the model we can maximize the information obtained.

churn prediction machine learning

churn prediction machine learning

In the last cell we printed out the prediction matrix from the model. Every element of the array preds describes the probabilities of a customer to be churn or not-churn. In the example above, the first customer has a 96.23% probability to be not-churn and a 3.76% to be churn, therefore we can state that this customer is a non-churning customer. The fourth customer in contrast has a 20.06% probability to be not-churn and a 79.93% to be churn indicating he/she is a churn customer.

Using this data we can also identify situations where we are not really sure if a customer will be churning or not, like the last customer on the list, where the churn and not-churn probabilities are very similar (0.59% not-churn, 0.40% churn). During this stage we should also identify a threshold value that we can use to determine if a customer is churn or not-churn.

Final thoughts on churn prediction with Machine Learning

In production we can generate a daily report of future churning customers and this report can be sent directly to the customer service team who can then contact customers on the list to better understand their needs or propose new offers, different products or whatever ‘win-back strategy’ is in place.  

To sum up, in this post we showcased churn prediction with Machine Learning by creating a predictive model to identify customer churn. We specifically used a dataset from a financial service firm. However, regardless what industry you’re in, or your strategy to mitigate customer churn, you can stay proactive and anticipate your customer’s next move based on this type of analysis. As with anything in life and business, time is of the essence.

Davide Andreazzini

Davide Andreazzini

Hello, it's me Davide … I'm a Full Stack Dev but most of all I'm a tinkerer a hacker and I love coding. I've been coding Web Applications for 10 years and in the last 2 I started to get interested in Big Data Applications and applied Data Science. Here are some of the technologies that I like to work with: Javascript, Python, Scala, Clojure, Hadoop, Impala, Apache Spark, Tensorflow, Keras, MongoDB, RethinkDB, Redis, Docker, Kubernetes … I like to stay updated on new technologies and, in my free time, I play guitar and I'm always tinkering on projects involving microcontrollers like Arduino and RaspberryPi.

Leave a Reply

10115

Thanks for signing up!

Stay Connected

Receive great content about building successful products!