Sales Reporting

(Link)

1. Overview

Assume we have multiple CSV files in the same folder. In general, they have the same layout and have the column headers are in the first row.

Imports the libraries we need.

                
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
%matplotlib inline

from itertools import combinations
from collections import Counter

path = os.path.abspath('D:\DKTA\Tai lieu\github\project\Sales-Reporting\Sales-Reporting-main\data'
files = os.listdir(path)

df = pd.DataFrame()
for file in files:
    if file.endswith('.csv'):
        filepath = path + '\\' + file
        df = df.append(pd.read_csv(filepath), ignore_index=True)
df.head()
                
            

We have DataFrame: df with 6 columns: Order ID, Product, Quantity Ordered, Price Each, Order Date, Purchase Address

With code: df.shape will return the shape of df: 186850 rows and 6 columns

2. Process data

In the dataset have 545 values null of each columns. We drop mising values

Cast a pandas object to a specified dtype and add new column: day, month, sales, city

                
# Cast a pandas object to a specified dtype
df = df.astype({'Product':'category',
                'Quantity_Ordered':'int',
                'Price_Each':'float',
                'Quantity_Ordered':'int'})

df['Order_Date'] = pd.to_datetime(df['Order_Date'])

df['day'] = pd.DatetimeIndex(df['Order_Date']).day
df['month'] = pd.DatetimeIndex(df['Order_Date']).month

# Add new column: calculate total, city
df['Sales'] = df['Quantity_Ordered'] * df['Price_Each']
df['City'] = df['Purchase_Address'].apply(lambda x: x.split(',')[1])
df.head()
                
            

Creat dataframe groupby Product

                
all_products = df.groupby('Product').sum()['Quantity_Ordered']
prices = df.groupby('Product').mean()['Price_Each']
total = df.groupby('Product').sum()['Sales']

df_quantity_price = pd.concat([all_products, prices, total], axis=1)
df_quantity_price.sort_values(by='Quantity_Ordered', ascending=False, inplace=True)
df_quantity_price
                
            

3. Visualization

- Visualization Quantity Ordered of each product with Price and Sales

                
# What product sold the most? price of product and revenue of each product

fig, ax1 = plt.subplots(2, figsize=(12, 8))

x = df_quantity_price.index
y1 = df_quantity_price['Quantity_Ordered']
y2 = df_quantity_price['Sales']
y3 = df_quantity_price['Price_Each']

ax1[0].bar(x, y1, color='g')
ax1[0].set_ylabel('Quantity Ordered', color='g', size=14)
ax1[0].set_xticklabels(x, rotation=45, size=12, ha='right')
ax1[0].set(xticklabels=[])
ax1[0].grid(False)

ax2 = ax1[0].twinx()
ax2.plot(x, y2, 'r')
ax2.set_ylabel('Sales', color='r', size=14)
ax2.set(xticklabels=[])
ax2.grid(False)

ax1[1].bar(x, y1, color='g')
ax1[1].set_xlabel('Products')
ax1[1].set_ylabel('Quantity Ordered', color='g', size=14)
ax1[1].set_xticklabels(x, rotation=45, size=12, ha='right')
ax1[1].grid(False)

ax3 = ax1[1].twinx()
ax3.plot(x, y3, 'r')
ax3.set_ylabel('Price Each', color='r', size=14)
ax3.grid(False)

plt.show()
                
            

+ Low-priced products have higher sales.

+ Price and sales have linear relationships.

+ Macbook pro has the highest sales followed by iPhone and Thinkpad Laptop .

- Revenue increases gradually towards the end of the year but sales suffered an unexpectedly sharp fall in September. Sales reached the highest in December, possibly due to shopping demand, year-end gift giving on Christmas

                
# Plot sum of products and total revenue by time
quantity_by_month = df.groupby('month').sum()['Quantity_Ordered']
total_by_month = df.groupby('month').sum()['Sales']

df_month = pd.concat([quantity_by_month, total_by_month], axis=1)

df_month
# What was the best month for sales? How much was earned that month?
fig, ax1 = plt.subplots( figsize=(12, 4))

x = df_month.index
y1 = df_month['Quantity_Ordered']
y2 = df_month['Sales']

ax1.bar(x, y1, color='g')
ax1.set_ylabel('Quantity Ordered', color='g', size=14)
ax1.grid(False)

ax2 = ax1.twinx()
ax2.plot(x, y2, 'r')
ax2.set_ylabel('Sales', color='r', size=14)
ax2.grid(False)
                
            

                
# Plot sum of products and total revenue by city
quantity_by_city = df.groupby('City').sum()['Quantity_Ordered']
total_by_city = df.groupby('City').sum()['Sales']

df_city = pd.concat([quantity_by_city, total_by_city], axis=1)
df_city
# What city has the best sales?
fig, ax1 = plt.subplots( figsize=(12, 4))

x = df_city.index
y1 = df_city['Quantity_Ordered']
y2 = df_city['Sales']

ax1.bar(x, y1, color='g')
ax1.set_ylabel('Quantity Ordered', color='g', size=14)
ax1.set_xlabel('city', size=14)
ax1.grid(False)

ax2 = ax1.twinx()
ax2.plot(x, y2, 'r')
ax2.set_ylabel('Sales', color='r', size=14)
ax2.grid(False)