Building RFM model on customer purchase behavior to estimate Customer Life Time Value (CLV)

Himanshu Bhardwaj
3 min readMay 5, 2020


Segmentation of customers on the basis of their recency, frequency, and monetary scores.

To increase their market share, firms in these days focus more on customer loyalty and profitability. To build a successful long term customer relationship, a firm needs to identify the true value and loyalty of its customers to incorporate a more targeted and personalized approach of marketing. In this article the customer segmentation is done by estimating the customer life time value (CLV) from customer’s past purchase behavior. In order to estimate CLV, RFM (recency, frequency, and monetary) marketing analysis methods is used. To demonstrate how RFM model works, an online retail data set is used to calculate customer’s recency, frequency, and monetary scores. K-means clustering method is used to unsupervised clustering of the data set based on recency, frequency, and monetary scores.

There are some classifications for CLV models. One of these divisions was proposed by Gupta et al. (2006). Gupta et al. (2006) described six modeling approaches:

  1. RFM Models which is based on Recency, Frequency, and Monetary.
  2. Probability Models that is based on Pareto/NBD model and Markov chains.
  3. Econometric Models like probability model is based on Pareto/NBD model.
  4. customer acquisition, customer retention, and customer margin and expansion.
  5. Persistence Models is Based on modeling the behavior of its components, that is, acquisition, retention, and cross-selling
  6. Computer Science Models which is based on theory (e.g., utility theory) and are easy to interpret.

In this article, RFM model is used to segment the customers of an online retail firm, whose data set is described below.

About the data set

An online retail sales data set is taken for customer segmentation. The data set contains customer’s purchase details over a period of one year. Few entries of the data set is given below.

Online Retail shopping data set of customers

Calculation of Recency, frequency, and Monetary score of the customers from the data set

Let us discuss first what is meant by Recency, frequency, and Monetary score of a customer

(1) Recency: the period since the last purchase; a lower value corresponds to a higher probability of the customer´s making a repeat purchase;

(2) Frequency: number of purchases made within a certain period; higher frequency indicates greater loyalty;

(3) Monetary: the money spent during a certain period; a higher value indicates that the company should focus more on that customer.

After the calculation of recency, frequency, and monetary scores of the customers from the data set following table is prepared.

Data table of RFM score

K-means unsupervised clustering to segment the customers

In order to segment the customers based on the RFM score, k-means unsupervised method is employed. The customers are segmented into five classes. The result of the clustering is shown in the following figures.

Table showing centroid of the clusters
Table showing centroids of the clusters

Calculation of CLV

Finally, after obtaining the clusters based on recency, frequency, and monetary score of the customers, CLV for each clusters can be calculated by the following formula

CLV = NR * w1 + NF * w2 + NM * w3

where, NR, NF, and NM are the normalized value of the clusters centers (recency, frequency, and monetary). Different firms based on their experience assign different values to weights w1, w2, and w3. The weights can be estimated by Analytical Hierarchical process (AHP). Based on the CLV score clusters can be labeled as Very low, low, mid, high and very high value classes.



Himanshu Bhardwaj