Sales Reporting
(Link)1. Overview
Assume we have multiple CSV files in the same folder. In general, they have the same layout and have the column headers are in the first row.
Imports the libraries we need.
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline
from itertools import combinations
from collections import Counter
path = os.path.abspath('D:\DKTA\Tai lieu\github\project\Sales-Reporting\Sales-Reporting-main\data'
files = os.listdir(path)
df = pd.DataFrame()
for file in files:
if file.endswith('.csv'):
filepath = path + '\\' + file
df = df.append(pd.read_csv(filepath), ignore_index=True)
df.head()
We have DataFrame: df with 6 columns: Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address
With code: df.shape will return the shape of df: 186850 rows and 6 columns
2. Process data
In the dataset have 545 values null of each columns. We drop mising values
Cast a pandas object to a specified dtype and add new column: day, month, sales, city
# Cast a pandas object to a specified dtype
df = df.astype({'Product':'category',
'Quantity_Ordered':'int',
'Price_Each':'float',
'Quantity_Ordered':'int'})
df['Order_Date'] = pd.to_datetime(df['Order_Date'])
df['day'] = pd.DatetimeIndex(df['Order_Date']).day
df['month'] = pd.DatetimeIndex(df['Order_Date']).month
# Add new column: calculate total, city
df['Sales'] = df['Quantity_Ordered'] * df['Price_Each']
df['City'] = df['Purchase_Address'].apply(lambda x: x.split(',')[1])
df.head()
Creat dataframe groupby Product
all_products = df.groupby('Product').sum()['Quantity_Ordered']
prices = df.groupby('Product').mean()['Price_Each']
total = df.groupby('Product').sum()['Sales']
df_quantity_price = pd.concat([all_products, prices, total], axis=1)
df_quantity_price.sort_values(by='Quantity_Ordered', ascending=False, inplace=True)
df_quantity_price
3. Visualization
- Visualization Quantity Ordered of each product with Price and Sales
# What product sold the most? price of product and revenue of each product
fig, ax1 = plt.subplots(2, figsize=(12, 8))
x = df_quantity_price.index
y1 = df_quantity_price['Quantity_Ordered']
y2 = df_quantity_price['Sales']
y3 = df_quantity_price['Price_Each']
ax1[0].bar(x, y1, color='g')
ax1[0].set_ylabel('Quantity Ordered', color='g', size=14)
ax1[0].set_xticklabels(x, rotation=45, size=12, ha='right')
ax1[0].set(xticklabels=[])
ax1[0].grid(False)
ax2 = ax1[0].twinx()
ax2.plot(x, y2, 'r')
ax2.set_ylabel('Sales', color='r', size=14)
ax2.set(xticklabels=[])
ax2.grid(False)
ax1[1].bar(x, y1, color='g')
ax1[1].set_xlabel('Products')
ax1[1].set_ylabel('Quantity Ordered', color='g', size=14)
ax1[1].set_xticklabels(x, rotation=45, size=12, ha='right')
ax1[1].grid(False)
ax3 = ax1[1].twinx()
ax3.plot(x, y3, 'r')
ax3.set_ylabel('Price Each', color='r', size=14)
ax3.grid(False)
plt.show()
+ Low-priced products have higher sales.
+ Price and sales have linear relationships.
+ Macbook pro has the highest sales followed by iPhone and Thinkpad Laptop .
- Revenue increases gradually towards the end of the year but sales suffered an unexpectedly sharp fall in September. Sales reached the highest in December, possibly due to shopping demand, year-end gift giving on Christmas
# Plot sum of products and total revenue by time
quantity_by_month = df.groupby('month').sum()['Quantity_Ordered']
total_by_month = df.groupby('month').sum()['Sales']
df_month = pd.concat([quantity_by_month, total_by_month], axis=1)
df_month
# What was the best month for sales? How much was earned that month?
fig, ax1 = plt.subplots( figsize=(12, 4))
x = df_month.index
y1 = df_month['Quantity_Ordered']
y2 = df_month['Sales']
ax1.bar(x, y1, color='g')
ax1.set_ylabel('Quantity Ordered', color='g', size=14)
ax1.grid(False)
ax2 = ax1.twinx()
ax2.plot(x, y2, 'r')
ax2.set_ylabel('Sales', color='r', size=14)
ax2.grid(False)
# Plot sum of products and total revenue by city
quantity_by_city = df.groupby('City').sum()['Quantity_Ordered']
total_by_city = df.groupby('City').sum()['Sales']
df_city = pd.concat([quantity_by_city, total_by_city], axis=1)
df_city
# What city has the best sales?
fig, ax1 = plt.subplots( figsize=(12, 4))
x = df_city.index
y1 = df_city['Quantity_Ordered']
y2 = df_city['Sales']
ax1.bar(x, y1, color='g')
ax1.set_ylabel('Quantity Ordered', color='g', size=14)
ax1.set_xlabel('city', size=14)
ax1.grid(False)
ax2 = ax1.twinx()
ax2.plot(x, y2, 'r')
ax2.set_ylabel('Sales', color='r', size=14)
ax2.grid(False)