Profile Analysis

October, 2023 by Leon Schedlin Czarlinski

Data

Python

Clustering

In this project we will cover the concept of clustering, which is a unsupervised learning algorithm that involves grouping similar data points together based on their characteristics. The goal of clustering is to find similarities within a dataset and group similar data points together while keeping dissimilar data points separate.

Think of this project from a business perspective. Based on the customer profile, the business can identify different clusters and customize the experience, offers, services, products, and others based on this clusterization.

Check the analysis

Below, I explain a little bit about the Data analytics methodology, tasks involved in the project, data and code source, and the conclusions.

Data Analytics Methodology

Transforming data into insights: the six steps of data analytics include: ask, prepare, process, analyze, share, and act.

visualization of the 6 steps methodology of transforming data into insights

You will see that this project will not cover all the steps above, but the idea of initially listing all steps is to show how to work in a data analysis process.

Tasks in this project

Understand the problem statement
Import libraries and data sets
Perform exploratory analysis and data visualization
Clustering with K-MEANS and DBSCAN
Conclusions

Data source

For this project we will be using a dataset called "Mall Customer Segmentation Data" available at kaggle. Click here to access.

This data set is created only for the learning purpose of the customer segmentation concepts , also known as market basket analysis. There are 5 features that are available:

Feature	Type	Description
Customer ID	Integer	Unique ID assigned to the customer
Gender	Categorical	Gender of the applicant
Age	Integer	Age of the customer
Annual Income (k$)	Integer	Annual Income of the customer
Spending Score	Integer	Score assigned by the mall based on customer behavior and spending nature (1-100)

Code source

To develop the analysis, I used the Colab notebook available on Kaggle. Click here to access the file and see the results.


  #Data
  import pandas as pd
  import numpy as np
  
  #Data Visualization
  import matplotlib as plt
  from matplotlib import style
  
  #Clustering Models
  import seaborn as sns
  from sklearn.cluster import DBSCAN, KMeans
  
  #Ignore Warnings
  import warnings
  warnings.filterwarnings('ignore')
  
  #cd to directory with file
  import os
  for dirname, _, filenames in os.walk('/kaggle/input'):
      for filename in filenames:
          print(os.path.join(dirname, filename))