Marketplace Sales Data Analysis

Here we will do Exploratory Data Analysis on marketplace sales data.

About the Dataset of online sales marketplace data analysis

Marketplace sales dataset description

Column Info

Variable info of Sales dataset

Dataset Source

Dataset can be downloaded from Kaggle website Online Sales Dataset – Popular Marketplace Data

Libraries Importing

#Import the required libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Dataset Loading

#load the dataset
dt = pd.read_csv('Online Sales Data.csv')
dt
Load the sales dataset

Data exploring and cleaning

Before EDA, first we will explore the dataset and perform the cleaning process.

dt.head()
First 5 rows of the data

Here we are checking column names of the data.

dt.columns
Columns names of dataset

dt.info()
Dataset variables data types

from datetime import datetime
dt['Date'] = pd.to_datetime(dt['Date'])

Data column has object data type. We will convert to datetime.

Important info related data types

Data types information

convert_disc = {'Transaction ID' : 'int32', 'Product Category' : 'string', 'Product Name' : 'string', 'Units Sold' : 'int16', 
                'Region' : 'string', 'Payment Method' : 'category'  }

dt = dt.astype(convert_disc)
dt.info()                
data types conversion

dt.describe()

Description of each column

It tells us description of each integer variable.

dt.isnull().sum()

null values analysis

dt.duplicated().sum()

No duplicate value

Exploratory Data Analysis on online marketplace sales data

Univariate Analysis

(1) Which payment method is mostly used?

pm = sns.countplot(data = dt, x = dt['Payment Method'])
plt.figure(figsize = (12,6))
pm.set_xlabel('Payment method used', fontsize = 15)
pm.set_ylabel('Count of records', fontsize = 15)
pm.set_title('Univariate analysis of Payment Method', fontsize= 14)

Univariate analysis of Payment method column of dataset

region = dt['Region'].value_counts() 
plt.pie(region.values, 
        labels=region.index, 
        autopct='%1.1f%%') 
plt.show() 

Pie chart of region column of online marketplace sales data

(2) Top 10 selling products

# Top-selling products by frequency
top_products = dt['Product Name'].value_counts().head(10)
top_products.plot(kind='bar', figsize=(16, 6))
plt.title('Top 10 Selling Products by Frequency')
plt.xlabel('Product Name')
plt.ylabel('Count')
plt.xticks(rotation=45)
plt.show()

Top 10 Selling products

(3) Distribution of Quantities (Total Number of Product Units) sold

plt.figure(figsize=(10, 6))
sns.histplot(dt['Units Sold'], bins=10, kde=True)
plt.title('Distribution of Quantities (Number of Product Units) Sold')
plt.xlabel('Quantity (Number of Units)')
plt.ylabel('Frequency')
plt.show()

# Summary statistics for quantity
quantity_stats = dt['Units Sold'].describe()
print(quantity_stats)

Distribution of Quantities (Total Number of Product Units) sold

(4) Distribution of each product unit price

# Distribution of unit prices
plt.figure(figsize=(10, 6))
sns.histplot(dt['Unit Price'], bins=10, kde=True)
plt.title('Distribution of Unit Prices')
plt.xlabel('Unit Price')
plt.ylabel('Frequency')
plt.show()

# Summary statistics for unit price
unit_price_stats = dt['Unit Price'].describe()
print(unit_price_stats)

Distribution of each product unit price

Bivariate Analysis

(5) How much overall revenue (sales) do you get each month throughout time?

# Extract day, month, year, and weekday
dt['Day'] = dt['Date'].dt.day
dt['Month'] = dt['Date'].dt.month
dt['Year'] = dt['Date'].dt.year
dt['Weekday'] = dt['Date'].dt.weekday

dt.set_index('Date')['Total Revenue'].resample('ME').sum().plot(kind='line', figsize=(12, 6))
plt.title('Monthly sales over time')
plt.xlabel('Date')
plt.ylabel('Total sales')
plt.show()

Monthly sales over time

plt.figure(figsize=(10, 6))
sns.countplot(x=dt['Product Category'], order=dt['Product Category'].value_counts().index)
plt.title('Frequency of Product Categories')
plt.xlabel('Count')
plt.ylabel('Category')
plt.show()

Countplot of product categories

(6) For every product category, what is the total income (Product unit * Unit Price)?

pt_category_revenue = dt.groupby('Product Category')['Total Revenue'].sum().sort_values()
pt_category_revenue.plot(kind='barh', figsize=(10, 6))
plt.title('Total Revenue per each Product Category')
plt.xlabel('Total Revenue')
plt.ylabel('Product Category')
plt.show()

Total revenue per each product category

(7) What is the name of the product that brought in the most money?

# Top-selling products by total revenue
top_products_revenue = dt.groupby('Product Name')['Total Revenue'].sum().sort_values(ascending=False).head(10)
top_products_revenue.plot(kind='bar', figsize=(10, 6))
plt.title('Top 10 Selling Products by Total Revenue')
plt.xlabel('Product Name')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.show()

Top selling products of marketplace data nalysis

(8) Which region has the largest overall revenue generation?

region_revenue = dt.groupby('Region')['Total Revenue'].sum().sort_values()
region_revenue.plot(kind='barh', figsize=(10, 6))
plt.title('Total Revenue per Region')
plt.xlabel('Total Revenue')
plt.ylabel('Region')
plt.show()

Total revenue per each region of dataset

(9) How is the entire revenue divided among the various payment options (Units Sold * Unit Price)?

payment_revenue = dt.groupby('Payment Method', observed=True)['Total Revenue'].sum().sort_values()
payment_revenue.plot(kind='barh', figsize=(10, 6))
plt.title('Total Revenue (Units sold * Unit Price) by Payment Method')
plt.xlabel('Total Revenue')
plt.ylabel('Payment Method')
plt.show()

Sales by each payment method

Multivariate Analysis

(10) What is the distribution of sales (total revenue) for each product category across regions, and how do product categories perform in various regions?


# Heatmap for visualizing sales distribution geographically
pivot_table_region = dt.pivot_table(values='Total Revenue', index='Product Category', columns='Region', aggfunc='sum', fill_value=0)
plt.figure(figsize=(12, 8))
sns.heatmap(pivot_table_region, annot=True, fmt='.0f', cmap='Blues')
plt.title('Total Revenue by Category and Region')
plt.xlabel('Region')
plt.ylabel('Category')
plt.show()

Total revenue generated by each category and region of  dataset

(11) What was each Product Category’s Total Revenue throughout time (Units Sold * Unit Price)?

# Line charts with different lines for each category to see trends over time
dt.set_index('Date').groupby('Product Category')['Total Revenue'].resample('ME').sum().unstack().plot(kind='line', figsize=(12, 8))
plt.title('Monthly Sales Over Time by Category')
plt.xlabel('Date')
plt.ylabel('Total Revenue')
plt.legend(title='Category')
plt.show()

Monthly sales over time by each category

(12) Which payment methods have the most sales (revenue) within each product category?

# Stacked bar charts to show sales distribution across payment methods for each category
pivot_table_cat_pay = dt.pivot_table(values='Total Revenue', index='Product Category', columns='Payment Method', aggfunc='sum', fill_value=0, observed = False)
pivot_table_cat_pay.plot(kind='bar', stacked=True, figsize=(12, 8))
plt.title('Total Sales by Product Category and Payment Method')
plt.xlabel('Product Category')
plt.ylabel('Total Sales')
plt.xticks(rotation=45)
plt.legend(title='Payment Method')
plt.show()

Sales in each product category by  payment method

(13) What regional differences exist in the payment method preferences?

# Stacked bar charts to compare payment method preferences across regions
pivot_table_reg_payment = dt.pivot_table(values='Total Revenue', index='Region', columns='Payment Method', aggfunc='sum', fill_value=0, observed = True)
pivot_table_reg_payment.plot(kind='bar', stacked=True, figsize=(12, 8))
plt.title('Total Revenue by Region and Payment Method')
plt.xlabel('Region')
plt.ylabel('Total Revenue')
plt.xticks(rotation=45)
plt.legend(title='Payment Method')
plt.show()

Total revenue in each region and which payment method used

I tried my best to finish this project. If you are interesting more data analysis EDA project you can check Customer Segmentation Data Analysis.

Scroll to Top