Profile Analysis
In this project we will cover the concept of clustering, which is a unsupervised learning algorithm that involves grouping similar data points together based on their characteristics. The goal of clustering is to find similarities within a dataset and group similar data points together while keeping dissimilar data points separate.
Think of this project from a business perspective. Based on the customer profile, the business can identify different clusters and customize the experience, offers, services, products, and others based on this clusterization.
Check the analysisBelow, I explain a little bit about the Data analytics methodology, tasks involved in the project, data and code source, and the conclusions.
Data Analytics Methodology
Transforming data into insights: the six steps of data analytics include: ask, prepare, process, analyze, share, and act.
You will see that this project will not cover all the steps above, but the idea of initially listing all steps is to show how to work in a data analysis process.
Tasks in this project
- Understand the problem statement
- Import libraries and data sets
- Perform exploratory analysis and data visualization
- Clustering with K-MEANS and DBSCAN
- Conclusions
Data source
For this project we will be using a dataset called "Mall Customer Segmentation Data" available at kaggle. Click here to access.
This data set is created only for the learning purpose of the customer segmentation concepts , also known as market basket analysis. There are 5 features that are available:
Feature | Type | Description |
---|---|---|
Customer ID | Integer | Unique ID assigned to the customer |
Gender | Categorical | Gender of the applicant |
Age | Integer | Age of the customer |
Annual Income (k$) | Integer | Annual Income of the customer |
Spending Score | Integer | Score assigned by the mall based on customer behavior and spending nature (1-100) |
Code source
To develop the analysis, I used the Colab notebook available on Kaggle. Click here to access the file and see the results.
#Data
import pandas as pd
import numpy as np
#Data Visualization
import matplotlib as plt
from matplotlib import style
#Clustering Models
import seaborn as sns
from sklearn.cluster import DBSCAN, KMeans
#Ignore Warnings
import warnings
warnings.filterwarnings('ignore')
#cd to directory with file
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
for filename in filenames:
print(os.path.join(dirname, filename))