Vendor Performance Analysis

Author

Your Name

Published

June 30, 2025

Abstract

This report provides a comprehensive analysis of vendor performance using sales, purchase, and inventory data. It covers exploratory data analysis, key performance indicators, vendor and brand comparisons, inventory turnover, capital lock-in, and statistical testing of profit margins. The insights aim to support data-driven decisions for procurement, pricing, and inventory management.

Keywords

Vendor Analysis, Sales Performance, Inventory Turnover, Profit Margin, Data Analysis, Procurement, Python, EDA, Statistical Testing

1 Understanding Dataset

1.1 Importing Libraries

Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
import warnings
import sqlite3
from scipy.stats import ttest_ind
import scipy.stats as stats
sns.set(style="whitegrid")
warnings.filterwarnings('ignore')

1.2 Loading the Dataset

Code
# creating database connection
conn = sqlite3.connect('D:\Vendor_Analysis_Project\End to End Vendor Analysis Project\database_file\inventory.db')

# List all tables in the database
tables = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", conn)
print("Available tables in the database:")
print(tables)

# If the table exists, fetch data; otherwise, print a warning
if not tables.empty and 'vendor_sales_summary' in tables['name'].values:
    df = pd.read_sql_query("SELECT * FROM vendor_sales_summary", conn)
    display(df.head())
else:
    print("Table 'vendor_sales_summary' does not exist in the database. Please check the table names above.")
Available tables in the database:
                   name
0       begin_inventory
1         end_inventory
2             purchases
3       purchase_prices
4                 sales
5        vendor_invoice
6  vendor_sales_summary
VendorNumber VendorName Brand Description ActualPrice Volume TotalPurchaseQuantity TotalPurchaseDollars TotalSalesQuantity TotalSalesDollars TotalSalesPrice TotalExciseTax FreightCost GrossProfit ProfitMargin StockTurnover SalesToPurchaseRatio
0 1128 BROWN-FORMAN CORP 1233 Jack Daniels No 7 Black 36.99 1750.0 145080 3811251.60 142049.0 5.101920e+06 672819.31 260999.20 68601.68 1290667.91 25.297693 0.979108 1.338647
1 4425 MARTIGNETTI COMPANIES 3405 Tito's Handmade Vodka 28.99 1750.0 164038 3804041.22 160247.0 4.819073e+06 561512.37 294438.66 144929.24 1015032.27 21.062810 0.976890 1.266830
2 17035 PERNOD RICARD USA 8068 Absolut 80 Proof 24.99 1750.0 187407 3418303.68 187140.0 4.538121e+06 461140.15 343854.07 123780.22 1119816.92 24.675786 0.998575 1.327594
3 3960 DIAGEO NORTH AMERICA INC 4261 Capt Morgan Spiced Rum 22.99 1750.0 201682 3261197.94 200412.0 4.475973e+06 420050.01 368242.80 257032.07 1214774.94 27.139908 0.993703 1.372493
4 3960 DIAGEO NORTH AMERICA INC 3545 Ketel One Vodka 29.99 1750.0 138109 3023206.01 135838.0 4.223108e+06 545778.28 249587.83 257032.07 1199901.61 28.412764 0.983556 1.396897

2 Exploratory Data Analysis

  • Previously, we examined the various tables in the datbase to identify key variables,understand their relationships, and determine which ones should be included in the final analysis.
  • In this pase of EDA, we will analyze the resultant table to gain insights into the distributuion of each column.This will help us understand data patterns,identify anomalies,and ensure data quality before proceeding with further analysis.

2.1 Check the datatype of the column

Code
# Check the datatype of the column
df.dtypes
VendorNumber               int64
VendorName                object
Brand                      int64
Description               object
ActualPrice              float64
Volume                   float64
TotalPurchaseQuantity      int64
TotalPurchaseDollars     float64
TotalSalesQuantity       float64
TotalSalesDollars        float64
TotalSalesPrice          float64
TotalExciseTax           float64
FreightCost              float64
GrossProfit              float64
ProfitMargin             float64
StockTurnover            float64
SalesToPurchaseRatio     float64
dtype: object

2.2 Check the summary datset

Code
# summary statistics
df.describe().T
count mean std min 25% 50% 75% max
VendorNumber 10692.0 10650.649458 18753.519148 2.000000 3951.000000 7153.000000 9552.000000 2.013590e+05
Brand 10692.0 18039.228769 12662.187074 58.000000 5793.500000 18761.500000 25514.250000 9.063100e+04
ActualPrice 10692.0 35.643671 148.246016 0.490000 10.990000 15.990000 28.990000 7.499990e+03
Volume 10692.0 847.360550 664.309212 50.000000 750.000000 750.000000 750.000000 2.000000e+04
TotalPurchaseQuantity 10692.0 3140.886831 11095.086769 1.000000 36.000000 262.000000 1975.750000 3.376600e+05
TotalPurchaseDollars 10692.0 30106.693372 123067.799627 0.710000 453.457500 3655.465000 20738.245000 3.811252e+06
TotalSalesQuantity 10692.0 3077.482136 10952.851391 0.000000 33.000000 261.000000 1929.250000 3.349390e+05
TotalSalesDollars 10692.0 42239.074419 167655.265984 0.000000 729.220000 5298.045000 28396.915000 5.101920e+06
TotalSalesPrice 10692.0 18793.783627 44952.773386 0.000000 289.710000 2857.800000 16059.562500 6.728193e+05
TotalExciseTax 10692.0 1774.226259 10975.582240 0.000000 4.800000 46.570000 418.650000 3.682428e+05
FreightCost 10692.0 61433.763214 60938.458032 0.090000 14069.870000 50293.620000 79528.990000 2.570321e+05
GrossProfit 10692.0 12158.773084 46214.338722 -52002.780000 52.920000 1399.640000 8660.200000 1.290668e+06
ProfitMargin 10692.0 -15.620770 443.555329 -23730.638953 13.324515 30.405457 39.956135 9.971666e+01
StockTurnover 10692.0 1.706793 6.020460 0.000000 0.807229 0.981529 1.039342 2.745000e+02
SalesToPurchaseRatio 10692.0 2.504390 8.459067 0.000000 1.153729 1.436894 1.665449 3.529286e+02

2.3 Checking the mode value of numerical column

Code
#  Mode values for each numerical column
mode_values=df.mode().iloc[0]
print("\nMode values:\n\n",mode_values)

Mode values:

 VendorNumber                            4425.0
VendorName               MARTIGNETTI COMPANIES
Brand                                      809
Description                   Southern Comfort
ActualPrice                               9.99
Volume                                   750.0
TotalPurchaseQuantity                     12.0
TotalPurchaseDollars                     95.28
TotalSalesQuantity                        12.0
TotalSalesDollars                          0.0
TotalSalesPrice                            0.0
TotalExciseTax                             0.0
FreightCost                          144929.24
GrossProfit                                0.0
ProfitMargin                               0.0
StockTurnover                              1.0
SalesToPurchaseRatio                       0.0
Name: 0, dtype: object

2.4 Distribution Plot of the Numerical Column

Code
#Distribution plot for Numerical columns
numerical_cols=df.select_dtypes(include=np.number).columns

plt.figure(figsize=(15,10))
for i ,col in enumerate(numerical_cols):
    plt.subplot(4,4,i+1) # Adjust grid layout as needed
    sns.histplot(df[col],kde=True,bins=30)
    plt.title(col)
plt.tight_layout()
plt.show()    

2.5 Outliers Detection of Numerical columns

Code
# Outliers Detection with Boxplots
plt.figure(figsize=(15,10))
for i,col in enumerate(numerical_cols):
    plt.subplot(4,4,i+1)
    sns.boxplot(y=df[col])
    plt.title(col)
plt.tight_layout()
plt.show()

2.6 Summary Statistics Insights:

Negative & Zero Values:

  • Gross Profit: Minimum value is -52,002.78, indicating losses. Some products or transactions may be selling at a loss due to high costs or selling at discounts lower than the purchase price..
  • Profit Margin: Has a minimum of -∞, which suggests cases where revenue is zero or even lower than costs.
  • Total Sales Quantity & Sales Dollars: Minimum values are 0, meaning some products were purchased but never sold. These could be slow-moving or obsolete stock.

Outliers Indicated by High Standard Deviations:

  • Purchase & Actual Prices: The max values (5,681.81 & 7,499.99) are significantly higher than the mean (24.39 & 35.64), indicating potential premium products.
  • Freight Cost: Huge variation, from 0.09 to 257,032.07, suggests logistics inefficiencies or bulk shipments.
  • Stock Turnover: Ranges from 0 to 274.5, implying some products sell extremely fast while others remain in stock indefinitely. Value more than 1 indicates that Sold quantity for that product is higher than purchased quantity due to either sales are being fulfilled from older stock.

2.7 Filtering the Data by removing inconsistencies

Code
# Let's filter the data by incosistencies by removing all the data that is not required.
df=pd.read_sql_query("""Select *
from vendor_sales_summary
where GrossProfit > 0
and ProfitMargin > 0
and TotalSalesQuantity > 0 """,conn)
Code
df
VendorNumber VendorName Brand Description ActualPrice Volume TotalPurchaseQuantity TotalPurchaseDollars TotalSalesQuantity TotalSalesDollars TotalSalesPrice TotalExciseTax FreightCost GrossProfit ProfitMargin StockTurnover SalesToPurchaseRatio
0 1128 BROWN-FORMAN CORP 1233 Jack Daniels No 7 Black 36.99 1750.0 145080 3811251.60 142049.0 5.101920e+06 672819.31 260999.20 68601.68 1290667.91 25.297693 0.979108 1.338647
1 4425 MARTIGNETTI COMPANIES 3405 Tito's Handmade Vodka 28.99 1750.0 164038 3804041.22 160247.0 4.819073e+06 561512.37 294438.66 144929.24 1015032.27 21.062810 0.976890 1.266830
2 17035 PERNOD RICARD USA 8068 Absolut 80 Proof 24.99 1750.0 187407 3418303.68 187140.0 4.538121e+06 461140.15 343854.07 123780.22 1119816.92 24.675786 0.998575 1.327594
3 3960 DIAGEO NORTH AMERICA INC 4261 Capt Morgan Spiced Rum 22.99 1750.0 201682 3261197.94 200412.0 4.475973e+06 420050.01 368242.80 257032.07 1214774.94 27.139908 0.993703 1.372493
4 3960 DIAGEO NORTH AMERICA INC 3545 Ketel One Vodka 29.99 1750.0 138109 3023206.01 135838.0 4.223108e+06 545778.28 249587.83 257032.07 1199901.61 28.412764 0.983556 1.396897
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
8560 9815 WINE GROUP INC 8527 Concannon Glen Ellen Wh Zin 4.99 750.0 2 2.64 5.0 1.595000e+01 10.96 0.55 27100.41 13.31 83.448276 2.500000 6.041667
8561 8004 SAZERAC CO INC 5683 Dr McGillicuddy's Apple Pie 0.49 50.0 6 2.34 134.0 6.566000e+01 1.47 7.04 50293.62 63.32 96.436186 22.333333 28.059829
8562 3924 HEAVEN HILL DISTILLERIES 9123 Deep Eddy Vodka 0.99 50.0 2 1.48 2.0 1.980000e+00 0.99 0.10 14069.87 0.50 25.252525 1.000000 1.337838
8563 3960 DIAGEO NORTH AMERICA INC 6127 The Club Strawbry Margarita 1.99 200.0 1 1.47 72.0 1.432800e+02 77.61 15.12 257032.07 141.81 98.974037 72.000000 97.469388
8564 7245 PROXIMO SPIRITS INC. 3065 Three Olives Grape Vodka 0.99 50.0 1 0.71 86.0 8.514000e+01 33.66 4.46 38994.78 84.43 99.166079 86.000000 119.915493

8565 rows × 17 columns

2.8 Count Plots for Categorical Columns

Code
# Count Plots for Categorical Columns
categorical_cols=["VendorName","Description"]

plt.figure(figsize=(12,5))
for i,col in enumerate(categorical_cols):
    plt.subplot(1,2,i+1)
    sns.countplot(y=df[col], order=df[col].value_counts().index[:10])  # Top 10 categories
    plt.title(f"Count Plot of {col}")
plt.tight_layout()
plt.show()

3 Correlation Heatmap

Code
# Correlation  Heatmap
plt.figure(figsize=(12,8))
correlation_matrix=df[numerical_cols].corr()
sns.heatmap(correlation_matrix,annot=True,fmt=".2f", cmap="coolwarm",linewidths=0.5)
plt.title("Correaltion Heatmap")
plt.show()

3.1 Correlation Insights

  • PurchasePrice has weak correlations with TotalSalesDollars (-0.012) and GrossProfit (-0.016), suggesting that price variations do not significantly impact sales revenue or profit.
  • Strong correlation between total purchase quantity and total sales quantity (0.999), confirming efficient inventory turnover.
  • Negative correlation between profit margin & total sales price (-0.179) suggests that as sales price increases, margins decrease, possibly due to competitive pricing pressures.
  • StockTurnover has weak negative correlations with both GrossProfit (-0.038) and ProfitMargin (-0.055), indicating that faster turnover does not necessarily result in higher profitability.

4 Data Analysis

4.0.1 Identify Brands that needs Promotional or Pricing Adjustments which exhibit lower sales performance but higher profit margins.

Code
brand_performance=df.groupby('Description').agg({
    'TotalSalesDollars':'sum',
    'ProfitMargin':'mean'}).reset_index()

brand_performance.sort_values('ProfitMargin')
Description TotalSalesDollars ProfitMargin
5485 Pepperjack Barossa Red 191.92 0.020842
2954 Flint & Steel Svgn Bl Napa V 119.92 0.033356
2179 Croft Tawny Porto 191.84 0.041701
2561 Douglass Hill Merlot 143.76 0.083472
5385 Parducci 13 True Grit Chard 24927.81 0.121190
... ... ... ...
4568 M Chiarlo Gavi Wh 1208.90 99.393664
657 Beniotome Sesame Shochu 4768.41 99.534226
6449 Skinnygirl Tangerine Vodka 2368.42 99.544844
2411 DiSaronno Amaretto 4781.16 99.553246
5528 Pezzi King Svgn Bl Dry Creek 2221.29 99.604734

7707 rows × 3 columns

Code
print("Brands with Low Sales but High Profit Margins:")
# threshold for low_sales (bottom 15 %) and "high margin" (top 15%)
low_sales_threshold=brand_performance['TotalSalesDollars'].quantile(0.15)
high_margin_threshold=brand_performance['ProfitMargin'].quantile(0.85)

# Filter brands with low sales but high profit margins
target_brands=brand_performance[
    (brand_performance['TotalSalesDollars'] <= low_sales_threshold) &
    (brand_performance['ProfitMargin'] >= high_margin_threshold)
]
display(target_brands.sort_values('TotalSalesDollars'))
Brands with Low Sales but High Profit Margins:
Description TotalSalesDollars ProfitMargin
6199 Santa Rita Organic Svgn Bl 9.99 66.466466
2369 Debauchery Pnt Nr 11.58 65.975820
2070 Concannon Glen Ellen Wh Zin 15.95 83.448276
2188 Crown Royal Apple 27.86 89.806174
6237 Sauza Sprklg Wild Berry Marg 27.96 82.153076
... ... ... ...
5074 Nanbu Bijin Southern Beauty 535.68 76.747312
2271 Dad's Hat Rye Whiskey 538.89 81.851584
57 A Bichot Clos Marechaudes 539.94 67.740860
6245 Sbragia Home Ranch Merlot 549.75 66.444748
3326 Goulee Cos d'Estournel 10 558.87 69.434752

198 rows × 3 columns

Code
# filtering all the brand which have sales < 1000 in TotalSalesDollars column
brand_performance = brand_performance[brand_performance['TotalSalesDollars']<10000] # for better visualization

4.0.2 Scatterplot to find Brand values b/w Total Sales and Profit Margin(%)

Code
# Scatterplot to find Brand values b/w Total Sales and Profit Margin(%)
plt.figure(figsize=(10, 6))
sns.scatterplot(data=brand_performance,x='TotalSalesDollars',y='ProfitMargin',color='blue',label='All Brands',alpha=0.2)
sns.scatterplot(data=target_brands,x='TotalSalesDollars',y='ProfitMargin',color='red',label='Target Brands')

# line to show the 'low_sale_threshold' and 'high_margin_threshold'
plt.axhline(high_margin_threshold,linestyle='--',color='black',label="High Margin Threshold")
plt.axvline(low_sales_threshold,linestyle='--',color='black',label="Low Sales Threshold")

plt.xlabel("Total Sales ($)")
plt.ylabel("Profit Margin(%)")
plt.title("Brands for Promotional or Pricing Adjustment")
plt.legend()
plt.grid(True)
plt.show()

4.0.3 Which vendors and brands demonstrate the highest sales performance

Code
def format_dollars(value):
    if value >= 1_000_000:
        return f"{value/1_000_000:.2f}M"
    elif value >= 1_000:
        return f"{value/1_000:.2f}K"
    else :
        return str(value)
        
Code
# Top Vendors & Brands by Sales Performance
top_vendors=df.groupby("VendorName")["TotalSalesDollars"].sum().nlargest(10)
top_brands=df.groupby("Description")["TotalSalesDollars"].sum().nlargest(10)
top_vendors
VendorName
DIAGEO NORTH AMERICA INC      6.799010e+07
MARTIGNETTI COMPANIES         3.933036e+07
PERNOD RICARD USA             3.206320e+07
JIM BEAM BRANDS COMPANY       3.142302e+07
BACARDI USA INC               2.485482e+07
CONSTELLATION BRANDS INC      2.421875e+07
E & J GALLO WINERY            1.839990e+07
BROWN-FORMAN CORP             1.824723e+07
ULTRA BEVERAGE COMPANY LLP    1.650254e+07
M S WALKER INC                1.470646e+07
Name: TotalSalesDollars, dtype: float64
Code
top_brands
Description
Jack Daniels No 7 Black    7964746.76
Tito's Handmade Vodka      7399657.58
Grey Goose Vodka           7209608.06
Capt Morgan Spiced Rum     6356320.62
Absolut 80 Proof           6244752.03
Jameson Irish Whiskey      5715759.69
Ketel One Vodka            5070083.56
Baileys Irish Cream        4150122.07
Kahlua                     3604858.66
Tanqueray                  3456697.90
Name: TotalSalesDollars, dtype: float64
Code
top_vendors.apply(lambda x:format_dollars(x))
VendorName
DIAGEO NORTH AMERICA INC      67.99M
MARTIGNETTI COMPANIES         39.33M
PERNOD RICARD USA             32.06M
JIM BEAM BRANDS COMPANY       31.42M
BACARDI USA INC               24.85M
CONSTELLATION BRANDS INC      24.22M
E & J GALLO WINERY            18.40M
BROWN-FORMAN CORP             18.25M
ULTRA BEVERAGE COMPANY LLP    16.50M
M S WALKER INC                14.71M
Name: TotalSalesDollars, dtype: object
Code
top_brands.apply(lambda x:format_dollars(x))
Description
Jack Daniels No 7 Black    7.96M
Tito's Handmade Vodka      7.40M
Grey Goose Vodka           7.21M
Capt Morgan Spiced Rum     6.36M
Absolut 80 Proof           6.24M
Jameson Irish Whiskey      5.72M
Ketel One Vodka            5.07M
Baileys Irish Cream        4.15M
Kahlua                     3.60M
Tanqueray                  3.46M
Name: TotalSalesDollars, dtype: object

4.0.4 Top 10 Vendors and Brands

Code

plt.figure(figsize=(15,5))

# plot for Top Vendors
plt.subplot(1,2,1)
ax1=sns.barplot(y=top_vendors.index,x=top_vendors.values,palette="Blues_r")
plt.title("Top 10 Vendors by Sales")
plt.grid(False)

for bar in ax1.patches:
    ax1.text(bar.get_width() + (bar.get_width() * 0.02), 
             bar.get_y() + bar.get_height() / 2, 
             format_dollars(bar.get_width()), 
             ha='left', va='center', fontsize=10, color='black')



# plot for Top Brands
plt.subplot(1,2,2)
ax2=sns.barplot(y=top_brands.index.astype(str),x=top_brands.values,palette="Reds_r")
plt.title("Top 10 Brands by Sales")

for bar in ax2.patches:
    ax2.text(bar.get_width() + (bar.get_width() * 0.02), 
             bar.get_y() + bar.get_height() / 2, 
             format_dollars(bar.get_width()), 
             ha='left', va='center', fontsize=10, color='black')

plt.tight_layout()
plt.grid(False)
plt.show()

4.0.5 Which vendors contribute the most to total Purchase dollars?

Code
# Rank Vendors by Total Purchase Dollars
vendor_performance=df.groupby("VendorName").agg({
    "TotalPurchaseDollars":"sum",
    "GrossProfit":"sum",
    "TotalSalesDollars":"sum"
}).reset_index()

# Calculate Contribution % to Overall Procurement
vendor_performance["Purchase_Contribution%"]= (vendor_performance["TotalPurchaseDollars"]/vendor_performance["TotalPurchaseDollars"].sum()) * 100

# Rank Vendors by Total Purchase Dollars & Profitability
vendor_performance=round(vendor_performance.sort_values(by="TotalPurchaseDollars",ascending=False),2)

# Display Top 10 Vendors
top_vendors=vendor_performance.head(10)
top_vendors['TotalSalesDollars']=top_vendors['TotalSalesDollars'].apply(format_dollars)
top_vendors['TotalPurchaseDollars']=top_vendors['TotalPurchaseDollars'].apply(format_dollars)
top_vendors['GrossProfit']=top_vendors['GrossProfit'].apply(format_dollars)
top_vendors
VendorName TotalPurchaseDollars GrossProfit TotalSalesDollars Purchase_Contribution%
25 DIAGEO NORTH AMERICA INC 50.10M 17.89M 67.99M 16.30
57 MARTIGNETTI COMPANIES 25.50M 13.83M 39.33M 8.30
68 PERNOD RICARD USA 23.85M 8.21M 32.06M 7.76
46 JIM BEAM BRANDS COMPANY 23.49M 7.93M 31.42M 7.64
6 BACARDI USA INC 17.43M 7.42M 24.85M 5.67
20 CONSTELLATION BRANDS INC 15.27M 8.95M 24.22M 4.97
11 BROWN-FORMAN CORP 13.24M 5.01M 18.25M 4.31
30 E & J GALLO WINERY 12.07M 6.33M 18.40M 3.93
106 ULTRA BEVERAGE COMPANY LLP 11.17M 5.34M 16.50M 3.63
53 M S WALKER INC 9.76M 4.94M 14.71M 3.18

4.0.6 A Pareto Chart to show bar chart of “Purchase contributions%’ column along with the cummulative sum

Code
# Created A Pareto Chart to show bar chart of "Purchase contributions%' column along with the cummulative sum

# Created a new 'Cummulative Contributions" columns
top_vendors['Cumulative_Contributions%']=top_vendors['Purchase_Contribution%'].cumsum()

fig,ax1=plt.subplots(figsize=(15,10))

# Barplot for Purchase Contributions%
sns.barplot(x=top_vendors['VendorName'],y=top_vendors['Purchase_Contribution%'],palette="mako",ax=ax1)

for i , value in enumerate(top_vendors['Purchase_Contribution%']):
    ax1.text(i,value-1,str(value)+'%',ha='center',fontsize=10,color='white')

# Line Plot for Cumulative Contribution%
ax2=ax1.twinx()
ax2.plot(top_vendors['VendorName'],top_vendors['Cumulative_Contributions%'],color='red',marker='o',linestyle='dashed',label='Cumulative %')

ax1.set_xticklabels(top_vendors['VendorName'],rotation=90)
ax1.set_ylabel('Purchase Contirbution %',color='blue')
ax2.set_ylabel('Cumulative Contributions %',color='red')
ax1.set_xlabel('Vendors')
ax1.set_title('Pareto Chart: Vendor Contribution to Total Purchases')

ax2.axhline(y=100,color='gray',linestyle='dashed',alpha=0.7)
ax2.legend(loc='upper right')

ax1.grid(False)
ax2.grid(False)
plt.show()

4.0.7 How much of total procurement is dependent on the top vendors?

Code
# Total Purchase Contribution by top 10 vendors
print(f"Total Purchase Contribution of top 10 vendors is {round(top_vendors['Purchase_Contribution%'].sum(),2)}% .")

vendors=list(top_vendors['VendorName'].values)
purchase_contributions=list(top_vendors['Purchase_Contribution%'].values)
total_contribution=sum(purchase_contributions)
remaining_contribution=100 - total_contribution

# Append "Other Vendors" Category
vendors.append("Other Vendors")
purchase_contributions.append(remaining_contribution)

# Donut Chart 
fig,ax=plt.subplots(figsize=(8,8))
wedges,texts,autotexts=ax.pie(purchase_contributions,labels=vendors,autopct='%1.1f%%',startangle=140,pctdistance=0.85,colors=plt.cm.Paired.colors)

# Add Total Contribution annotation in the Center
centre_circle=plt.Circle((0,0),0.70,fc='white')
fig.gca().add_artist(centre_circle)

# Add Total Contributions annotation in the center
plt.text(0,0,f"Top 10 Total:\n{total_contribution:.2f}%",fontsize=14,fontweight='bold',ha='center',va='center')

plt.title("Top 10 Vendor's Purchase Contribution (%)")
plt.show()
Total Purchase Contribution of top 10 vendors is 65.69% .

The remaining vendors contribute only 34.31%, meaning they are not utilized effectively or may not be as competitive. If vendor dependency is too high, consider identifying new suppliers to reduce risk.

4.0.8 Does purchasing in bulk reduce the unit price, and what is the optimal purchase volume for cost savings?

Code
# Calculate Unit Purchase Price Per Vendor & Volume Group
df["UnitPurchasePrice"]=df["TotalPurchaseDollars"]/df['TotalPurchaseQuantity']

# Group by Order Sizes (e.g., Small,Medium,Large Purchases)
df['OrderSize']=pd.qcut(df['TotalPurchaseQuantity'],q=3,labels=["Small","Medium","Large"])

# Analyze the Cost Savings Per Order Size
bulk_purchase_analysis=df.groupby("OrderSize")["UnitPurchasePrice"].mean().reset_index()
print(bulk_purchase_analysis)
  OrderSize  UnitPurchasePrice
0     Small          39.057543
1    Medium          15.486414
2     Large          10.777625
Code
# Boxplot 
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x="OrderSize", y="UnitPurchasePrice", palette="Set2")
plt.title("Impact of Bulk Purchasing on Unit Price")
plt.xlabel("Order Size")
plt.ylabel("Average Unit Purchase Price")
plt.show()

  • Vendors buying in bulk (Large Order Size) get the lowest unit price ($10.78 per unit), meaning higher margins if they can manage inventory efficiently.
  • The price difference between Small and Large orders is substantial (~72% reduction in unit cost)
  • This suggests that bulk pricing strategies successfully encourage vendors to purchase in larger volumes, leading to higher overall sales despite lower per-unit revenue.

4.0.9 Which vendors have low inventory turnover, indicating excess stock and slow-moving products?

Code
# Identify low Inventory Turnover Vendors
low_turnover_vendors = df[df["StockTurnover"] < 1].groupby("VendorName")["StockTurnover"].mean().reset_index()

#Sort by Lowest Turnover
low_turnover_vendors=low_turnover_vendors.sort_values(by="StockTurnover",ascending=True)
low_turnover_vendors.head(10)
VendorName StockTurnover
0 ALISA CARR BEVERAGES 0.615385
36 HIGHLAND WINE MERCHANTS LLC 0.708333
60 PARK STREET IMPORTS LLC 0.751306
19 Circa Wines 0.755676
26 Dunn Wine Brokers 0.766022
15 CENTEUR IMPORTS LLC 0.773953
78 SMOKY QUARTZ DISTILLERY LLC 0.783835
90 TAMWORTH DISTILLING 0.797078
91 THE IMPORTED GRAPE LLC 0.807569
101 WALPOLE MTN VIEW WINERY 0.820548
  • Slow-moving inventory increases holding costs (warehouse rent, insurance, depreciation)
  • Identifying vendors with low inventory turnover is critical for business efficiency, cost reduction, and profitability.

4.0.10 How much Capital is locked in unsold inventory per vendor,and which vendors contribute the most to it?

Code
# Calculate Unsold Inventory Value
df["UnsoldInventoryValue"] = (df["TotalPurchaseQuantity"] - df["TotalSalesQuantity"]) * df["UnitPurchasePrice"]
print('Total Unsold Capital:', format_dollars(df["UnsoldInventoryValue"].sum()))

# Aggregate Capital Locked per Vendor
inventory_value_per_vendor = df.groupby("VendorName")["UnsoldInventoryValue"].sum().reset_index()

# Sort Vendors with the Highest Locked Capital
inventory_value_per_vendor = inventory_value_per_vendor.sort_values(by="UnsoldInventoryValue", ascending=False)
inventory_value_per_vendor['UnsoldInventoryValue'] = inventory_value_per_vendor['UnsoldInventoryValue'].apply(format_dollars)
inventory_value_per_vendor.head(10)
Total Unsold Capital: 2.71M
VendorName UnsoldInventoryValue
25 DIAGEO NORTH AMERICA INC 722.21K
46 JIM BEAM BRANDS COMPANY 554.67K
68 PERNOD RICARD USA 470.63K
116 WILLIAM GRANT & SONS INC 401.96K
30 E & J GALLO WINERY 228.28K
79 SAZERAC CO INC 198.44K
11 BROWN-FORMAN CORP 177.73K
20 CONSTELLATION BRANDS INC 133.62K
61 MOET HENNESSY USA INC 126.48K
77 REMY COINTREAU USA INC 118.60K

5 Hypothesis Testing

5.0.1 What is the 95% confidence interval for Profit Margins of Top-Performing and Low-Performing Vendors.

Code
# Define top and low vendors based on Total Sales Dollars (Top 25% & Bottom 25%)
top_threshold=df['TotalSalesDollars'].quantile(0.75)
low_threshold=df['TotalSalesDollars'].quantile(0.25)

top_vendors = df[df["TotalSalesDollars"] >= top_threshold]["ProfitMargin"].dropna()
low_vendors = df[df["TotalSalesDollars"] <= low_threshold]["ProfitMargin"].dropna()

# Function to compute confidence interval
def confidence_interval(data, confidence=0.95):
    mean_val = np.mean(data)
    std_err = np.std(data, ddof=1) / np.sqrt(len(data))  # Standard error
    t_critical = stats.t.ppf((1 + confidence) / 2, df=len(data) - 1)
    margin_of_error = t_critical * std_err
    return mean_val, mean_val - margin_of_error, mean_val + margin_of_error

# Compute confidence intervals
top_mean, top_lower, top_upper = confidence_interval(top_vendors)
low_mean, low_lower, low_upper = confidence_interval(low_vendors)

print(f"Top Vendors 95% CI: ({top_lower:.2f}, {top_upper:.2f}), Mean: {top_mean:.2f}")
print(f"Low Vendors 95% CI: ({low_lower:.2f}, {low_upper:.2f}), Mean: {low_mean:.2f}")

plt.figure(figsize=(12, 6))

# Top Vendors Plot
sns.histplot(top_vendors, kde=True, color="blue", bins=30, alpha=0.5, label="Top Vendors")
plt.axvline(top_lower, color="blue", linestyle="--", label=f"Top Lower: {top_lower:.2f}")
plt.axvline(top_upper, color="blue", linestyle="--", label=f"Top Upper: {top_upper:.2f}")
plt.axvline(top_mean, color="blue", linestyle="-", label=f"Top Mean: {top_mean:.2f}")

# Low Vendors Plot
sns.histplot(low_vendors, kde=True, color="red", bins=30, alpha=0.5, label="Low Vendors")
plt.axvline(low_lower, color="red", linestyle="--", label=f"Low Lower: {low_lower:.2f}")
plt.axvline(low_upper, color="red", linestyle="--", label=f"Low Upper: {low_upper:.2f}")
plt.axvline(low_mean, color="red", linestyle="-", label=f"Low Mean: {low_mean:.2f}")

# Finalize Plot
plt.title("Confidence Interval Comparison: Top vs. Low Vendors (Profit Margin)")
plt.xlabel("Profit Margin (%)")
plt.ylabel("Frequency")
plt.legend()
plt.grid(True)
plt.show()

Top Vendors 95% CI: (30.74, 31.61), Mean: 31.17
Low Vendors 95% CI: (40.48, 42.62), Mean: 41.55

  • The confidence interval for low-performing vendors (40.48% to 42.62%) is significantly higher than that of top-performing vendors (30.74% to 31.61%).
  • This suggests that vendors with lower sales tend to maintain higher profit margins, potentially due to premium pricing or lower operational costs.
  • For High-Performing Vendors: If they aim to improve profitability, they could explore selective price adjustments, cost optimization, or bundling strategies.
  • For Low-Performing Vendors: Despite higher margins, their low sales volume might indicate a need for better marketing, competitive pricing, or improved distribution strategies.

5.0.2 Is there a significant difference in profit margins between top-performing and low-performing vendors?

Hypothesis:

H₀ (Null Hypothesis): There is no significant difference in the mean profit margins of top-performing and low-performing vendors.

H₁ (Alternative Hypothesis): The mean profit margins of top-performing and low-performing vendors are significantly different.

Code
top_threshold = df["TotalSalesDollars"].quantile(0.75)
low_threshold = df["TotalSalesDollars"].quantile(0.25)

top_vendors = df[df["TotalSalesDollars"] >= top_threshold]["ProfitMargin"].dropna()
low_vendors = df[df["TotalSalesDollars"] <= low_threshold]["ProfitMargin"].dropna()

# Perform Two-Sample T-Test
t_stat, p_value = ttest_ind(top_vendors, low_vendors, equal_var=False)

# Print results
print(f"T-Statistic: {t_stat:.4f}, P-Value: {p_value:.4f}")
if p_value < 0.05:
    print("Reject H₀: There is a significant difference in profit margins between top and low-performing vendors.")
else:
    print("Fail to Reject H₀: No significant difference in profit margins.")
T-Statistic: -17.6440, P-Value: 0.0000
Reject H₀: There is a significant difference in profit margins between top and low-performing vendors.
  • A p-value this small suggests that the difference is not just statistically significant but also practically meaningful.

  • The two vendor groups operate very differently in terms of profitability.