Data mining is not just for technical people.
And you might have to cluster your data even if you’re just segmenting your clients for your next marketing campaign. Or maybe you’re just a student who’d like to find out the basics of Weka (data mining software).
Here’s a brief data mining tutorial for non-techies to help you get started with clustering:
Where can you get Weka?
The safest option is its official website. Download Weka (Doesn’t work without Java).
And it’s free. 😁
Where do you find the right database?
Weka doesn’t work with just any database. And the algorithms you’re going to choose won’t fit all datasets.
So, if you want to use a specific algorithm, it’s best to just create your own set of data over which you can have full control. Aim for more than 1000 rows for accurate data.
But here are three sources where you could find some decent datasets:
tomslee.net/airbnb-data-collection-get-the-data
(Drop me a line if you know more.) 😉
And if you’re looking for a case study (in plain English) with few technical elements so you can get an idea of how clustering really works: 🎉
Case study – Bank clients segmentation through clustering
Disclaimer: part of the case study is missing as I’ve done it for a college project and the results are not disclosable
Study objectives
- Highlight the use of Weka for basic data mining processes
- Discover the most representative segment of a bank’s (fictional) clients
- Find out how a bank’s (fictional) services can be improved starting with the data regarding clients’ age, job, marital status, education, account balance, housing, and loans through an online marketing campaign that could bring new clients
Introduction
Data mining is the process through which valid and previously unknown information is extracted from a specific set of data and is then used to make an important business decision.
Briefly put, data mining is a method that allows YOU to find similar behavioral patterns, trends, or tendencies from an existing data set.
The main goal of the entire process is DISCOVERY.
From this point of view, I’ve chosen to find out the most significant clients of a bank (fictional) through clustering.
For this study, I picked a type of application often used in marketing and retail: identifying significant client profile and behavior patterns.
As a field of applicability, I’ve chosen banking. In this case, the main goal was to identify relevant clients (who are also loyal) and use their profile to create new digital marketing campaigns.
Typically, data mining could’ve been used to identify loyal clients or errors in the use of banking services, to discover new behavior, predict the way in which a service will be used, or estimate possible client administration costs.
The main target (and result) was to attract new clients based on analyzed profiles and behavior patterns. Thus, the desired profile of the bank’s possible clients will be created from the data on existing loyal clients.
As a result, we’ll be able to create a digital marketing campaign that will target exactly this market segment. And you might be looking to create alternative campaigns for the other significant client segments as well.
The link between objectives and strategic marketing
- Highlight the use of Weka for basic data mining processes
Facilitates the use of an innovative method on a dataset owned by a marketing department and capitalizes upon their power to create new marketing campaigns in a fast and more efficient way than any traditional method. Using such data mining tools or method for marketing operations can offer a competitive advantage.
- Discover the most representative segment of a bank’s (fictional) clients
Using a data mining software or method (like Weka) we can extract the profile of a significant or loyal client/customer. From this profile, we’ll build the online marketing campaigns.
Starting with the information offered by clients, personalized campaigns can be created. Clients’ response towards these is likely to be a positive one and people will be more interested in these than they would in a general, non-personalized campaign.
The success rate of a campaign will thus be considerably higher than if we had used a traditional method of segmentation.
Consequently, the chosen marketing strategy for this case study too will be using an innovative method to reduce costs (since Weka is a free tool 😍) and time spent on segmentation and to increase the success rate of marketing campaigns built with this method.
- Find out how a bank’s (fictional) services can be improved starting with the data regarding clients’ age, job, marital status, education, account balance, housing, and loans through an online marketing campaign that could bring new clients
In the case of companies or marketing departments that are using data mining or the client/market segmentation strategy for the first time, a reorientation of the general marketing strategy is needed.
Therefore, data mining is an easy way of determining which of a client’s attributes can be used to create and start a new digital marketing campaign.
You’ll also find out through which of these attributes you’ll get more success and a better response from your audience.
For example, through the dataset chosen in this case, you can test whether a campaign based on the clients’ job is more efficient than one that targets their age (or the other way around).
There are multiple opportunities and they can be diversified and tested until the right campaign model is found.
Work methodology
Dataset
Undisclosable .csv database. 🙁
Criteria for selecting a set of data
Any Weka project must start with a correctly built and error-free dataset.
Missing information would cause serious mistakes in the final results and thus jeopardize the marketing campaign we want to create.
For a closer data analysis, all information can be sorted and checked before you add it to Weka from Excel (or any other editor).
For instance, we can sort data according to age so that you can verify the diversity of your list based on the age of the people that are part of it. This ensures the objectivity of the Weka analysis to guarantee that the final campaigns will be fair.
After choosing a database, analyze it to see if it matches your project’s requirements and your objectives.
This way, the right database for this study had to contain a large number of people and relevant data on them that could be used for a marketing campaign. Among the necessary data were demographic characteristics, personal interests, and the relationship between the client and the seller (in this case the fictional bank).
The profile of the chosen dataset
The database I used contains attributes such as age, job, marital status, education level, account balance, and other info regarding their housing and bank loans.
This way, I ensured that the people in the database have diverse profiles/characteristics. Their ages are between 18 and 95; from students to retired people; single, married, or divorced; having a primary, secondary, tertiary, or unknown education; varied account balance, debts, or with no money in their accounts, etc.
Process and algorithm
The process
Data mining is the process of extracting, transforming, and analyzing the data in a set of data regardless of its size.
For this case study, the data mining process was used to gather info regarding a fictional bank’s clients. This type of analysis will then be used to plan a digital marketing campaign and facilitate other general business decisions.
The data mining can help identify errors, patterns, and data correlations to predict approximate but effective results. This information can then be used to generate new results, profit, and other benefits, to reduce costs and risks, or to improve the seller-client relation.
Using exact client data we can customize campaigns that will allow us to increase our profit, satisfy our clients, and avoid losing large sums of money on useless marketing campaigns that don’t target a specific buyer persona.
The data mining algorithm
I used Simple K-Means Clustering as an unsupervised learning algorithm that allows us to discover new data correlations. (Note: It does so much more than just that. But I’ll stick to the basics for now.)
After choosing an algorithm, I’ve selected the number of wanted (or needed if you have a specific target in mind) clusters (3), the maximum number of iterations (500), and the distance metric (EuclideanDistance).
Note: Again, clustering is so much more than just these metrics. And this is a good thing. If you’re looking into learning data mining on an advanced level you’ll see how these functions, classifiers (etc.) can help you get more accurate results.
The clustering results were then shown in a table whose attributes and columns correspond to the final cluster centroids.
The first cluster (Full data), preceding the ones we’ll use to interpret the data, show the full set of data.
The data mining software
Weka is a software that supports and uses a series of machine learning algorithms to complete data mining tasks.
These algorithms can be written in Java (command line) or directly apply the chosen algorithm to your set of data (like for this case study).
Other than clustering, you can perform pre-processing, classification, regression, association, and visualization operations through Weka as well.
Weka by itself is a project whose goal is to provide a collection of machine learning and data processing algorithms.
Results and interpretation
Disclaimer: this part is incomplete as the results will remain private but I’ll highlight some of the missing information
I got 3 clusters. Cluster 0 with 21,149 representatives. Cluster 1 with 7,268 representatives. Cluster 2 with 16,794 representatives.
For the 0, 1, and 2 clusters:
The significant cluster 0 represents 47% of all clustered instances, cluster 1 represents only 16%, and cluster 2 is 37% of the total.
From all 3 clusters, cluster 0 is the most significant one. Note: here you should characterize that specific segment
The second most significant cluster is 2. Note: characterize that segment
The last cluster, 1, is still significant although less representative than the other clusters. Note: characterize that segment
These clusters will guide the marketer towards identifying the ideal profile of a company’s clients (in this situation a fictional bank).
Optional: If you want to expand or multiply segmentation across multiple customer categories, you can add more clusters. For instance, 7 clusters (compared to 3 used in this study).
The concept of segmenting a bank’s (fictional) clients implies grouping them through clustering in different but relevant categories for your ideal client’s profile. Future promotional campaigns will be created through these results.
But when else can you use this method?
Such segmentation through the creation of several clusters can be used when we want to make a campaign for clients who are not yet loyal or significant but have the potential to become so.
Consequently, a group of clients who shows interest in the services the most significant clients are already using can be attracted through a similar personalized campaign.
And the efforts are minimal.
You can even create a simple email marketing campaign to convince them to use more advanced services they might need but don’t yet know about.
In fact, the data mining process can highlight the right benefits you can present to your clients to get them to ENGAGE.
This data is highly valuable when a company wants to better understand its clients, competition, market, positioning, etc.
Ways of visualizing your data and results
The examined data shouldn’t be kept just in their cluster form. They can be analyzed in a visual way too through certain graphics that make data interpretation much easier, faster, and efficient.
Visualization is an easier method of getting an overall look at your clients and their profiles.
With visualization, you can also follow your clients’ distribution according to one or more of their attributes. Note: I tested this by organizing them according to their age in an ascending order
This way we can correlate 2 attributes to visualize significant data we need to create a digital marketing campaign. Note: for this part, I also took a look at the clients’ profile according to their education and current job
So, if clustering alone becomes too difficult to use or doesn’t offer any relevant information, you can just simply visualize your data.
However, a set of data that belongs to a marketing department will never be used just for one campaign. Through Weka, you can extract and use a series of new datasets for free at any time.
Some of your other available marketing opportunities that can result from Weka data analysis and data mining are:
- Evaluating past activity and services to improve them in the future or to create new services and products that the clients (or potential customers) want
- Discovering the number of clients who are interested only in one service to present new opportunities to them
- Finding the number of competitors and monitoring them or their clients if you’re building a database to contain them
- Creating personalized competitions with appealing prizes for the current clients or to bring new leads
- Creating a newsletter (or any other email marketing campaign) through which you’ll segment your email lists in order to send only relevant messages to your clients (or any lead who wants to receive these offers)
- Creating relevant messages to post them publicly on social media networks
- Re-evaluating the current prices by comparing them to your clients’ purchasing power
- Creating customer loyalty campaigns/programs
- Predicting the clients’ wishes/needs and the services they’re going to use in the future
Conclusions
A marketing campaign just can’t be successful if it’s not based on a set of real client data.
Using Weka and data mining to segment your clients (or potential clients) is thus VITAL.
This technology facilitates the analysis of datasets regardless of their size, manually (with Java), or automatically (like for this project). Weka and data mining answer a constant problem that even the biggest enterprises stumble upon: attracting and keeping clients.
Moreover, using such a system allows companies to better understand their clients, the market in which they’re active, their competitors, and other factors that can impact their own business and profitability.
Before, market research was an indispensable stage through which companies got to know their customers or the market where they wanted to sell their products.
Now, however, there are many other resources you can use for this. Some even more efficient thanks to their speed and low cost.
Like data mining. 😍
And what exactly is data mining useful for?
In particular cases, data mining is useful when it comes to identifying repetitive behavior or trends to help evaluate client actions or the information they’re sharing virtually via social media, forums, or discussion groups.
Without the data mining process, companies would find it harder to find out what clients truly think of their products/services, the reasons why a service is not successful, or why its sales have increased/decreased during a certain time span or after a change has been made.
Currently, datasets are no longer useful just to understand certain events or adjustments. They are now used to help executives make correct and efficient decisions for their business and marketing endeavors.
These can affect the company’s future and its projects positively or negatively. Depending on how you’re managing them.
How do you think data mining will improve marketing in the future? 🤖
Share your thoughts and let me know if you’re already using data mining for your marketing duties.