Vendor Performance Analysis

Author

Your Name

Published

June 30, 2025

Abstract

This report provides a comprehensive analysis of vendor performance using sales, purchase, and inventory data. It covers exploratory data analysis, key performance indicators, vendor and brand comparisons, inventory turnover, capital lock-in, and statistical testing of profit margins. The insights aim to support data-driven decisions for procurement, pricing, and inventory management.

Keywords

Vendor Analysis, Sales Performance, Inventory Turnover, Profit Margin, Data Analysis, Procurement, Python, EDA, Statistical Testing

1 Understanding Dataset

1.1 Importing Libraries

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns 
import warnings
import sqlite3
from scipy.stats import ttest_ind
import scipy.stats as stats
sns.set(style="whitegrid")
warnings.filterwarnings('ignore')

1.2 Loading the Dataset

Code

# creating database connection
conn = sqlite3.connect('D:\Vendor_Analysis_Project\End to End Vendor Analysis Project\database_file\inventory.db')

# List all tables in the database
tables = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", conn)
print("Available tables in the database:")
print(tables)

# If the table exists, fetch data; otherwise, print a warning
if not tables.empty and 'vendor_sales_summary' in tables['name'].values:
    df = pd.read_sql_query("SELECT * FROM vendor_sales_summary", conn)
    display(df.head())
else:
    print("Table 'vendor_sales_summary' does not exist in the database. Please check the table names above.")

Available tables in the database:
                   name
0       begin_inventory
1         end_inventory
2             purchases
3       purchase_prices
4                 sales
5        vendor_invoice
6  vendor_sales_summary

	VendorNumber	VendorName	Brand	Description	ActualPrice	Volume	TotalPurchaseQuantity	TotalPurchaseDollars	TotalSalesQuantity	TotalSalesDollars	TotalSalesPrice	TotalExciseTax	FreightCost	GrossProfit	ProfitMargin	StockTurnover	SalesToPurchaseRatio
0	1128	BROWN-FORMAN CORP	1233	Jack Daniels No 7 Black	36.99	1750.0	145080	3811251.60	142049.0	5.101920e+06	672819.31	260999.20	68601.68	1290667.91	25.297693	0.979108	1.338647
1	4425	MARTIGNETTI COMPANIES	3405	Tito's Handmade Vodka	28.99	1750.0	164038	3804041.22	160247.0	4.819073e+06	561512.37	294438.66	144929.24	1015032.27	21.062810	0.976890	1.266830
2	17035	PERNOD RICARD USA	8068	Absolut 80 Proof	24.99	1750.0	187407	3418303.68	187140.0	4.538121e+06	461140.15	343854.07	123780.22	1119816.92	24.675786	0.998575	1.327594
3	3960	DIAGEO NORTH AMERICA INC	4261	Capt Morgan Spiced Rum	22.99	1750.0	201682	3261197.94	200412.0	4.475973e+06	420050.01	368242.80	257032.07	1214774.94	27.139908	0.993703	1.372493
4	3960	DIAGEO NORTH AMERICA INC	3545	Ketel One Vodka	29.99	1750.0	138109	3023206.01	135838.0	4.223108e+06	545778.28	249587.83	257032.07	1199901.61	28.412764	0.983556	1.396897

2 Exploratory Data Analysis

Previously, we examined the various tables in the datbase to identify key variables,understand their relationships, and determine which ones should be included in the final analysis.
In this pase of EDA, we will analyze the resultant table to gain insights into the distributuion of each column.This will help us understand data patterns,identify anomalies,and ensure data quality before proceeding with further analysis.

2.1 Check the datatype of the column

Code

# Check the datatype of the column
df.dtypes

VendorNumber               int64
VendorName                object
Brand                      int64
Description               object
ActualPrice              float64
Volume                   float64
TotalPurchaseQuantity      int64
TotalPurchaseDollars     float64
TotalSalesQuantity       float64
TotalSalesDollars        float64
TotalSalesPrice          float64
TotalExciseTax           float64
FreightCost              float64
GrossProfit              float64
ProfitMargin             float64
StockTurnover            float64
SalesToPurchaseRatio     float64
dtype: object

2.2 Check the summary datset

Code

# summary statistics
df.describe().T

	count	mean	std	min	25%	50%	75%	max
VendorNumber	10692.0	10650.649458	18753.519148	2.000000	3951.000000	7153.000000	9552.000000	2.013590e+05
Brand	10692.0	18039.228769	12662.187074	58.000000	5793.500000	18761.500000	25514.250000	9.063100e+04
ActualPrice	10692.0	35.643671	148.246016	0.490000	10.990000	15.990000	28.990000	7.499990e+03
Volume	10692.0	847.360550	664.309212	50.000000	750.000000	750.000000	750.000000	2.000000e+04
TotalPurchaseQuantity	10692.0	3140.886831	11095.086769	1.000000	36.000000	262.000000	1975.750000	3.376600e+05
TotalPurchaseDollars	10692.0	30106.693372	123067.799627	0.710000	453.457500	3655.465000	20738.245000	3.811252e+06
TotalSalesQuantity	10692.0	3077.482136	10952.851391	0.000000	33.000000	261.000000	1929.250000	3.349390e+05
TotalSalesDollars	10692.0	42239.074419	167655.265984	0.000000	729.220000	5298.045000	28396.915000	5.101920e+06
TotalSalesPrice	10692.0	18793.783627	44952.773386	0.000000	289.710000	2857.800000	16059.562500	6.728193e+05
TotalExciseTax	10692.0	1774.226259	10975.582240	0.000000	4.800000	46.570000	418.650000	3.682428e+05
FreightCost	10692.0	61433.763214	60938.458032	0.090000	14069.870000	50293.620000	79528.990000	2.570321e+05
GrossProfit	10692.0	12158.773084	46214.338722	-52002.780000	52.920000	1399.640000	8660.200000	1.290668e+06
ProfitMargin	10692.0	-15.620770	443.555329	-23730.638953	13.324515	30.405457	39.956135	9.971666e+01
StockTurnover	10692.0	1.706793	6.020460	0.000000	0.807229	0.981529	1.039342	2.745000e+02
SalesToPurchaseRatio	10692.0	2.504390	8.459067	0.000000	1.153729	1.436894	1.665449	3.529286e+02

2.3 Checking the mode value of numerical column

Code

#  Mode values for each numerical column
mode_values=df.mode().iloc[0]
print("\nMode values:\n\n",mode_values)


Mode values:

 VendorNumber                            4425.0
VendorName               MARTIGNETTI COMPANIES
Brand                                      809
Description                   Southern Comfort
ActualPrice                               9.99
Volume                                   750.0
TotalPurchaseQuantity                     12.0
TotalPurchaseDollars                     95.28
TotalSalesQuantity                        12.0
TotalSalesDollars                          0.0
TotalSalesPrice                            0.0
TotalExciseTax                             0.0
FreightCost                          144929.24
GrossProfit                                0.0
ProfitMargin                               0.0
StockTurnover                              1.0
SalesToPurchaseRatio                       0.0
Name: 0, dtype: object

2.4 Distribution Plot of the Numerical Column

Code

#Distribution plot for Numerical columns
numerical_cols=df.select_dtypes(include=np.number).columns

plt.figure(figsize=(15,10))
for i ,col in enumerate(numerical_cols):
    plt.subplot(4,4,i+1) # Adjust grid layout as needed
    sns.histplot(df[col],kde=True,bins=30)
    plt.title(col)
plt.tight_layout()
plt.show()

2.5 Outliers Detection of Numerical columns

Code

# Outliers Detection with Boxplots
plt.figure(figsize=(15,10))
for i,col in enumerate(numerical_cols):
    plt.subplot(4,4,i+1)
    sns.boxplot(y=df[col])
    plt.title(col)
plt.tight_layout()
plt.show()

2.6 Summary Statistics Insights:

Negative & Zero Values:

Gross Profit: Minimum value is -52,002.78, indicating losses. Some products or transactions may be selling at a loss due to high costs or selling at discounts lower than the purchase price..
Profit Margin: Has a minimum of -∞, which suggests cases where revenue is zero or even lower than costs.
Total Sales Quantity & Sales Dollars: Minimum values are 0, meaning some products were purchased but never sold. These could be slow-moving or obsolete stock.

Outliers Indicated by High Standard Deviations:

Purchase & Actual Prices: The max values (5,681.81 & 7,499.99) are significantly higher than the mean (24.39 & 35.64), indicating potential premium products.
Freight Cost: Huge variation, from 0.09 to 257,032.07, suggests logistics inefficiencies or bulk shipments.
Stock Turnover: Ranges from 0 to 274.5, implying some products sell extremely fast while others remain in stock indefinitely. Value more than 1 indicates that Sold quantity for that product is higher than purchased quantity due to either sales are being fulfilled from older stock.

2.7 Filtering the Data by removing inconsistencies

Code

# Let's filter the data by incosistencies by removing all the data that is not required.
df=pd.read_sql_query("""Select *
from vendor_sales_summary
where GrossProfit > 0
and ProfitMargin > 0
and TotalSalesQuantity > 0 """,conn)

Code

df

	VendorNumber	VendorName	Brand	Description	ActualPrice	Volume	TotalPurchaseQuantity	TotalPurchaseDollars	TotalSalesQuantity	TotalSalesDollars	TotalSalesPrice	TotalExciseTax	FreightCost	GrossProfit	ProfitMargin	StockTurnover	SalesToPurchaseRatio
0	1128	BROWN-FORMAN CORP	1233	Jack Daniels No 7 Black	36.99	1750.0	145080	3811251.60	142049.0	5.101920e+06	672819.31	260999.20	68601.68	1290667.91	25.297693	0.979108	1.338647
1	4425	MARTIGNETTI COMPANIES	3405	Tito's Handmade Vodka	28.99	1750.0	164038	3804041.22	160247.0	4.819073e+06	561512.37	294438.66	144929.24	1015032.27	21.062810	0.976890	1.266830
2	17035	PERNOD RICARD USA	8068	Absolut 80 Proof	24.99	1750.0	187407	3418303.68	187140.0	4.538121e+06	461140.15	343854.07	123780.22	1119816.92	24.675786	0.998575	1.327594
3	3960	DIAGEO NORTH AMERICA INC	4261	Capt Morgan Spiced Rum	22.99	1750.0	201682	3261197.94	200412.0	4.475973e+06	420050.01	368242.80	257032.07	1214774.94	27.139908	0.993703	1.372493
4	3960	DIAGEO NORTH AMERICA INC	3545	Ketel One Vodka	29.99	1750.0	138109	3023206.01	135838.0	4.223108e+06	545778.28	249587.83	257032.07	1199901.61	28.412764	0.983556	1.396897
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
8560	9815	WINE GROUP INC	8527	Concannon Glen Ellen Wh Zin	4.99	750.0	2	2.64	5.0	1.595000e+01	10.96	0.55	27100.41	13.31	83.448276	2.500000	6.041667
8561	8004	SAZERAC CO INC	5683	Dr McGillicuddy's Apple Pie	0.49	50.0	6	2.34	134.0	6.566000e+01	1.47	7.04	50293.62	63.32	96.436186	22.333333	28.059829
8562	3924	HEAVEN HILL DISTILLERIES	9123	Deep Eddy Vodka	0.99	50.0	2	1.48	2.0	1.980000e+00	0.99	0.10	14069.87	0.50	25.252525	1.000000	1.337838
8563	3960	DIAGEO NORTH AMERICA INC	6127	The Club Strawbry Margarita	1.99	200.0	1	1.47	72.0	1.432800e+02	77.61	15.12	257032.07	141.81	98.974037	72.000000	97.469388
8564	7245	PROXIMO SPIRITS INC.	3065	Three Olives Grape Vodka	0.99	50.0	1	0.71	86.0	8.514000e+01	33.66	4.46	38994.78	84.43	99.166079	86.000000	119.915493

8565 rows × 17 columns

2.8 Count Plots for Categorical Columns

Code

# Count Plots for Categorical Columns
categorical_cols=["VendorName","Description"]

plt.figure(figsize=(12,5))
for i,col in enumerate(categorical_cols):
    plt.subplot(1,2,i+1)
    sns.countplot(y=df[col], order=df[col].value_counts().index[:10])  # Top 10 categories
    plt.title(f"Count Plot of {col}")
plt.tight_layout()
plt.show()

3 Correlation Heatmap

Code

# Correlation  Heatmap
plt.figure(figsize=(12,8))
correlation_matrix=df[numerical_cols].corr()
sns.heatmap(correlation_matrix,annot=True,fmt=".2f", cmap="coolwarm",linewidths=0.5)
plt.title("Correaltion Heatmap")
plt.show()

3.1 Correlation Insights

PurchasePrice has weak correlations with TotalSalesDollars (-0.012) and GrossProfit (-0.016), suggesting that price variations do not significantly impact sales revenue or profit.
Strong correlation between total purchase quantity and total sales quantity (0.999), confirming efficient inventory turnover.
Negative correlation between profit margin & total sales price (-0.179) suggests that as sales price increases, margins decrease, possibly due to competitive pricing pressures.
StockTurnover has weak negative correlations with both GrossProfit (-0.038) and ProfitMargin (-0.055), indicating that faster turnover does not necessarily result in higher profitability.

4 Data Analysis

4.0.1 Identify Brands that needs Promotional or Pricing Adjustments which exhibit lower sales performance but higher profit margins.

Code

brand_performance=df.groupby('Description').agg({
    'TotalSalesDollars':'sum',
    'ProfitMargin':'mean'}).reset_index()

brand_performance.sort_values('ProfitMargin')

	Description	TotalSalesDollars	ProfitMargin
5485	Pepperjack Barossa Red	191.92	0.020842
2954	Flint & Steel Svgn Bl Napa V	119.92	0.033356
2179	Croft Tawny Porto	191.84	0.041701
2561	Douglass Hill Merlot	143.76	0.083472
5385	Parducci 13 True Grit Chard	24927.81	0.121190
...	...	...	...
4568	M Chiarlo Gavi Wh	1208.90	99.393664
657	Beniotome Sesame Shochu	4768.41	99.534226
6449	Skinnygirl Tangerine Vodka	2368.42	99.544844
2411	DiSaronno Amaretto	4781.16	99.553246
5528	Pezzi King Svgn Bl Dry Creek	2221.29	99.604734

7707 rows × 3 columns

Code

print("Brands with Low Sales but High Profit Margins:")
# threshold for low_sales (bottom 15 %) and "high margin" (top 15%)
low_sales_threshold=brand_performance['TotalSalesDollars'].quantile(0.15)
high_margin_threshold=brand_performance['ProfitMargin'].quantile(0.85)

# Filter brands with low sales but high profit margins
target_brands=brand_performance[
    (brand_performance['TotalSalesDollars'] <= low_sales_threshold) &
    (brand_performance['ProfitMargin'] >= high_margin_threshold)
]
display(target_brands.sort_values('TotalSalesDollars'))

Brands with Low Sales but High Profit Margins:

	Description	TotalSalesDollars	ProfitMargin
6199	Santa Rita Organic Svgn Bl	9.99	66.466466
2369	Debauchery Pnt Nr	11.58	65.975820
2070	Concannon Glen Ellen Wh Zin	15.95	83.448276
2188	Crown Royal Apple	27.86	89.806174
6237	Sauza Sprklg Wild Berry Marg	27.96	82.153076
...	...	...	...
5074	Nanbu Bijin Southern Beauty	535.68	76.747312
2271	Dad's Hat Rye Whiskey	538.89	81.851584
57	A Bichot Clos Marechaudes	539.94	67.740860
6245	Sbragia Home Ranch Merlot	549.75	66.444748
3326	Goulee Cos d'Estournel 10	558.87	69.434752

198 rows × 3 columns

Code

# filtering all the brand which have sales < 1000 in TotalSalesDollars column
brand_performance = brand_performance[brand_performance['TotalSalesDollars']<10000] # for better visualization

4.0.2 Scatterplot to find Brand values b/w Total Sales and Profit Margin(%)

Code

# Scatterplot to find Brand values b/w Total Sales and Profit Margin(%)
plt.figure(figsize=(10, 6))
sns.scatterplot(data=brand_performance,x='TotalSalesDollars',y='ProfitMargin',color='blue',label='All Brands',alpha=0.2)
sns.scatterplot(data=target_brands,x='TotalSalesDollars',y='ProfitMargin',color='red',label='Target Brands')

# line to show the 'low_sale_threshold' and 'high_margin_threshold'
plt.axhline(high_margin_threshold,linestyle='--',color='black',label="High Margin Threshold")
plt.axvline(low_sales_threshold,linestyle='--',color='black',label="Low Sales Threshold")

plt.xlabel("Total Sales ($)")
plt.ylabel("Profit Margin(%)")
plt.title("Brands for Promotional or Pricing Adjustment")
plt.legend()
plt.grid(True)
plt.show()

4.0.3 Which vendors and brands demonstrate the highest sales performance

Code

def format_dollars(value):
    if value >= 1_000_000:
        return f"{value/1_000_000:.2f}M"
    elif value >= 1_000:
        return f"{value/1_000:.2f}K"
    else :
        return str(value)

Code

# Top Vendors & Brands by Sales Performance
top_vendors=df.groupby("VendorName")["TotalSalesDollars"].sum().nlargest(10)
top_brands=df.groupby("Description")["TotalSalesDollars"].sum().nlargest(10)
top_vendors

VendorName
DIAGEO NORTH AMERICA INC      6.799010e+07
MARTIGNETTI COMPANIES         3.933036e+07
PERNOD RICARD USA             3.206320e+07
JIM BEAM BRANDS COMPANY       3.142302e+07
BACARDI USA INC               2.485482e+07
CONSTELLATION BRANDS INC      2.421875e+07
E & J GALLO WINERY            1.839990e+07
BROWN-FORMAN CORP             1.824723e+07
ULTRA BEVERAGE COMPANY LLP    1.650254e+07
M S WALKER INC                1.470646e+07
Name: TotalSalesDollars, dtype: float64

Code

top_brands

Description
Jack Daniels No 7 Black    7964746.76
Tito's Handmade Vodka      7399657.58
Grey Goose Vodka           7209608.06
Capt Morgan Spiced Rum     6356320.62
Absolut 80 Proof           6244752.03
Jameson Irish Whiskey      5715759.69
Ketel One Vodka            5070083.56
Baileys Irish Cream        4150122.07
Kahlua                     3604858.66
Tanqueray                  3456697.90
Name: TotalSalesDollars, dtype: float64

Code

top_vendors.apply(lambda x:format_dollars(x))

VendorName
DIAGEO NORTH AMERICA INC      67.99M
MARTIGNETTI COMPANIES         39.33M
PERNOD RICARD USA             32.06M
JIM BEAM BRANDS COMPANY       31.42M
BACARDI USA INC               24.85M
CONSTELLATION BRANDS INC      24.22M
E & J GALLO WINERY            18.40M
BROWN-FORMAN CORP             18.25M
ULTRA BEVERAGE COMPANY LLP    16.50M
M S WALKER INC                14.71M
Name: TotalSalesDollars, dtype: object

Code

top_brands.apply(lambda x:format_dollars(x))

Description
Jack Daniels No 7 Black    7.96M
Tito's Handmade Vodka      7.40M
Grey Goose Vodka           7.21M
Capt Morgan Spiced Rum     6.36M
Absolut 80 Proof           6.24M
Jameson Irish Whiskey      5.72M
Ketel One Vodka            5.07M
Baileys Irish Cream        4.15M
Kahlua                     3.60M
Tanqueray                  3.46M
Name: TotalSalesDollars, dtype: object

4.0.4 Top 10 Vendors and Brands

Code


plt.figure(figsize=(15,5))

# plot for Top Vendors
plt.subplot(1,2,1)
ax1=sns.barplot(y=top_vendors.index,x=top_vendors.values,palette="Blues_r")
plt.title("Top 10 Vendors by Sales")
plt.grid(False)

for bar in ax1.patches:
    ax1.text(bar.get_width() + (bar.get_width() * 0.02), 
             bar.get_y() + bar.get_height() / 2, 
             format_dollars(bar.get_width()), 
             ha='left', va='center', fontsize=10, color='black')



# plot for Top Brands
plt.subplot(1,2,2)
ax2=sns.barplot(y=top_brands.index.astype(str),x=top_brands.values,palette="Reds_r")
plt.title("Top 10 Brands by Sales")

for bar in ax2.patches:
    ax2.text(bar.get_width() + (bar.get_width() * 0.02), 
             bar.get_y() + bar.get_height() / 2, 
             format_dollars(bar.get_width()), 
             ha='left', va='center', fontsize=10, color='black')

plt.tight_layout()
plt.grid(False)
plt.show()

4.0.5 Which vendors contribute the most to total Purchase dollars?

Code

# Rank Vendors by Total Purchase Dollars
vendor_performance=df.groupby("VendorName").agg({
    "TotalPurchaseDollars":"sum",
    "GrossProfit":"sum",
    "TotalSalesDollars":"sum"
}).reset_index()

# Calculate Contribution % to Overall Procurement
vendor_performance["Purchase_Contribution%"]= (vendor_performance["TotalPurchaseDollars"]/vendor_performance["TotalPurchaseDollars"].sum()) * 100

# Rank Vendors by Total Purchase Dollars & Profitability
vendor_performance=round(vendor_performance.sort_values(by="TotalPurchaseDollars",ascending=False),2)

# Display Top 10 Vendors
top_vendors=vendor_performance.head(10)
top_vendors['TotalSalesDollars']=top_vendors['TotalSalesDollars'].apply(format_dollars)
top_vendors['TotalPurchaseDollars']=top_vendors['TotalPurchaseDollars'].apply(format_dollars)
top_vendors['GrossProfit']=top_vendors['GrossProfit'].apply(format_dollars)
top_vendors

	VendorName	TotalPurchaseDollars	GrossProfit	TotalSalesDollars	Purchase_Contribution%
25	DIAGEO NORTH AMERICA INC	50.10M	17.89M	67.99M	16.30
57	MARTIGNETTI COMPANIES	25.50M	13.83M	39.33M	8.30
68	PERNOD RICARD USA	23.85M	8.21M	32.06M	7.76
46	JIM BEAM BRANDS COMPANY	23.49M	7.93M	31.42M	7.64
6	BACARDI USA INC	17.43M	7.42M	24.85M	5.67
20	CONSTELLATION BRANDS INC	15.27M	8.95M	24.22M	4.97
11	BROWN-FORMAN CORP	13.24M	5.01M	18.25M	4.31
30	E & J GALLO WINERY	12.07M	6.33M	18.40M	3.93
106	ULTRA BEVERAGE COMPANY LLP	11.17M	5.34M	16.50M	3.63
53	M S WALKER INC	9.76M	4.94M	14.71M	3.18

4.0.6 A Pareto Chart to show bar chart of “Purchase contributions%’ column along with the cummulative sum

Code

# Created A Pareto Chart to show bar chart of "Purchase contributions%' column along with the cummulative sum

# Created a new 'Cummulative Contributions" columns
top_vendors['Cumulative_Contributions%']=top_vendors['Purchase_Contribution%'].cumsum()

fig,ax1=plt.subplots(figsize=(15,10))

# Barplot for Purchase Contributions%
sns.barplot(x=top_vendors['VendorName'],y=top_vendors['Purchase_Contribution%'],palette="mako",ax=ax1)

for i , value in enumerate(top_vendors['Purchase_Contribution%']):
    ax1.text(i,value-1,str(value)+'%',ha='center',fontsize=10,color='white')

# Line Plot for Cumulative Contribution%
ax2=ax1.twinx()
ax2.plot(top_vendors['VendorName'],top_vendors['Cumulative_Contributions%'],color='red',marker='o',linestyle='dashed',label='Cumulative %')

ax1.set_xticklabels(top_vendors['VendorName'],rotation=90)
ax1.set_ylabel('Purchase Contirbution %',color='blue')
ax2.set_ylabel('Cumulative Contributions %',color='red')
ax1.set_xlabel('Vendors')
ax1.set_title('Pareto Chart: Vendor Contribution to Total Purchases')

ax2.axhline(y=100,color='gray',linestyle='dashed',alpha=0.7)
ax2.legend(loc='upper right')

ax1.grid(False)
ax2.grid(False)
plt.show()

4.0.7 How much of total procurement is dependent on the top vendors?

Code

# Total Purchase Contribution by top 10 vendors
print(f"Total Purchase Contribution of top 10 vendors is {round(top_vendors['Purchase_Contribution%'].sum(),2)}% .")

vendors=list(top_vendors['VendorName'].values)
purchase_contributions=list(top_vendors['Purchase_Contribution%'].values)
total_contribution=sum(purchase_contributions)
remaining_contribution=100 - total_contribution

# Append "Other Vendors" Category
vendors.append("Other Vendors")
purchase_contributions.append(remaining_contribution)

# Donut Chart 
fig,ax=plt.subplots(figsize=(8,8))
wedges,texts,autotexts=ax.pie(purchase_contributions,labels=vendors,autopct='%1.1f%%',startangle=140,pctdistance=0.85,colors=plt.cm.Paired.colors)

# Add Total Contribution annotation in the Center
centre_circle=plt.Circle((0,0),0.70,fc='white')
fig.gca().add_artist(centre_circle)

# Add Total Contributions annotation in the center
plt.text(0,0,f"Top 10 Total:\n{total_contribution:.2f}%",fontsize=14,fontweight='bold',ha='center',va='center')

plt.title("Top 10 Vendor's Purchase Contribution (%)")
plt.show()

Total Purchase Contribution of top 10 vendors is 65.69% .

The remaining vendors contribute only 34.31%, meaning they are not utilized effectively or may not be as competitive. If vendor dependency is too high, consider identifying new suppliers to reduce risk.

4.0.8 Does purchasing in bulk reduce the unit price, and what is the optimal purchase volume for cost savings?

Code

# Calculate Unit Purchase Price Per Vendor & Volume Group
df["UnitPurchasePrice"]=df["TotalPurchaseDollars"]/df['TotalPurchaseQuantity']

# Group by Order Sizes (e.g., Small,Medium,Large Purchases)
df['OrderSize']=pd.qcut(df['TotalPurchaseQuantity'],q=3,labels=["Small","Medium","Large"])

# Analyze the Cost Savings Per Order Size
bulk_purchase_analysis=df.groupby("OrderSize")["UnitPurchasePrice"].mean().reset_index()
print(bulk_purchase_analysis)

  OrderSize  UnitPurchasePrice
0     Small          39.057543
1    Medium          15.486414
2     Large          10.777625

Code

# Boxplot 
plt.figure(figsize=(10, 6))
sns.boxplot(data=df, x="OrderSize", y="UnitPurchasePrice", palette="Set2")
plt.title("Impact of Bulk Purchasing on Unit Price")
plt.xlabel("Order Size")
plt.ylabel("Average Unit Purchase Price")
plt.show()

Vendors buying in bulk (Large Order Size) get the lowest unit price ($10.78 per unit), meaning higher margins if they can manage inventory efficiently.
The price difference between Small and Large orders is substantial (~72% reduction in unit cost)
This suggests that bulk pricing strategies successfully encourage vendors to purchase in larger volumes, leading to higher overall sales despite lower per-unit revenue.

4.0.9 Which vendors have low inventory turnover, indicating excess stock and slow-moving products?

Code

# Identify low Inventory Turnover Vendors
low_turnover_vendors = df[df["StockTurnover"] < 1].groupby("VendorName")["StockTurnover"].mean().reset_index()

#Sort by Lowest Turnover
low_turnover_vendors=low_turnover_vendors.sort_values(by="StockTurnover",ascending=True)
low_turnover_vendors.head(10)

	VendorName	StockTurnover
0	ALISA CARR BEVERAGES	0.615385
36	HIGHLAND WINE MERCHANTS LLC	0.708333
60	PARK STREET IMPORTS LLC	0.751306
19	Circa Wines	0.755676
26	Dunn Wine Brokers	0.766022
15	CENTEUR IMPORTS LLC	0.773953
78	SMOKY QUARTZ DISTILLERY LLC	0.783835
90	TAMWORTH DISTILLING	0.797078
91	THE IMPORTED GRAPE LLC	0.807569
101	WALPOLE MTN VIEW WINERY	0.820548

Slow-moving inventory increases holding costs (warehouse rent, insurance, depreciation)
Identifying vendors with low inventory turnover is critical for business efficiency, cost reduction, and profitability.

4.0.10 How much Capital is locked in unsold inventory per vendor,and which vendors contribute the most to it?

Code

# Calculate Unsold Inventory Value
df["UnsoldInventoryValue"] = (df["TotalPurchaseQuantity"] - df["TotalSalesQuantity"]) * df["UnitPurchasePrice"]
print('Total Unsold Capital:', format_dollars(df["UnsoldInventoryValue"].sum()))

# Aggregate Capital Locked per Vendor
inventory_value_per_vendor = df.groupby("VendorName")["UnsoldInventoryValue"].sum().reset_index()

# Sort Vendors with the Highest Locked Capital
inventory_value_per_vendor = inventory_value_per_vendor.sort_values(by="UnsoldInventoryValue", ascending=False)
inventory_value_per_vendor['UnsoldInventoryValue'] = inventory_value_per_vendor['UnsoldInventoryValue'].apply(format_dollars)
inventory_value_per_vendor.head(10)

Total Unsold Capital: 2.71M

	VendorName	UnsoldInventoryValue
25	DIAGEO NORTH AMERICA INC	722.21K
46	JIM BEAM BRANDS COMPANY	554.67K
68	PERNOD RICARD USA	470.63K
116	WILLIAM GRANT & SONS INC	401.96K
30	E & J GALLO WINERY	228.28K
79	SAZERAC CO INC	198.44K
11	BROWN-FORMAN CORP	177.73K
20	CONSTELLATION BRANDS INC	133.62K
61	MOET HENNESSY USA INC	126.48K
77	REMY COINTREAU USA INC	118.60K

5 Hypothesis Testing

5.0.1 What is the 95% confidence interval for Profit Margins of Top-Performing and Low-Performing Vendors.

Code

# Define top and low vendors based on Total Sales Dollars (Top 25% & Bottom 25%)
top_threshold=df['TotalSalesDollars'].quantile(0.75)
low_threshold=df['TotalSalesDollars'].quantile(0.25)

top_vendors = df[df["TotalSalesDollars"] >= top_threshold]["ProfitMargin"].dropna()
low_vendors = df[df["TotalSalesDollars"] <= low_threshold]["ProfitMargin"].dropna()

# Function to compute confidence interval
def confidence_interval(data, confidence=0.95):
    mean_val = np.mean(data)
    std_err = np.std(data, ddof=1) / np.sqrt(len(data))  # Standard error
    t_critical = stats.t.ppf((1 + confidence) / 2, df=len(data) - 1)
    margin_of_error = t_critical * std_err
    return mean_val, mean_val - margin_of_error, mean_val + margin_of_error

# Compute confidence intervals
top_mean, top_lower, top_upper = confidence_interval(top_vendors)
low_mean, low_lower, low_upper = confidence_interval(low_vendors)

print(f"Top Vendors 95% CI: ({top_lower:.2f}, {top_upper:.2f}), Mean: {top_mean:.2f}")
print(f"Low Vendors 95% CI: ({low_lower:.2f}, {low_upper:.2f}), Mean: {low_mean:.2f}")

plt.figure(figsize=(12, 6))

# Top Vendors Plot
sns.histplot(top_vendors, kde=True, color="blue", bins=30, alpha=0.5, label="Top Vendors")
plt.axvline(top_lower, color="blue", linestyle="--", label=f"Top Lower: {top_lower:.2f}")
plt.axvline(top_upper, color="blue", linestyle="--", label=f"Top Upper: {top_upper:.2f}")
plt.axvline(top_mean, color="blue", linestyle="-", label=f"Top Mean: {top_mean:.2f}")

# Low Vendors Plot
sns.histplot(low_vendors, kde=True, color="red", bins=30, alpha=0.5, label="Low Vendors")
plt.axvline(low_lower, color="red", linestyle="--", label=f"Low Lower: {low_lower:.2f}")
plt.axvline(low_upper, color="red", linestyle="--", label=f"Low Upper: {low_upper:.2f}")
plt.axvline(low_mean, color="red", linestyle="-", label=f"Low Mean: {low_mean:.2f}")

# Finalize Plot
plt.title("Confidence Interval Comparison: Top vs. Low Vendors (Profit Margin)")
plt.xlabel("Profit Margin (%)")
plt.ylabel("Frequency")
plt.legend()
plt.grid(True)
plt.show()

Top Vendors 95% CI: (30.74, 31.61), Mean: 31.17
Low Vendors 95% CI: (40.48, 42.62), Mean: 41.55

The confidence interval for low-performing vendors (40.48% to 42.62%) is significantly higher than that of top-performing vendors (30.74% to 31.61%).
This suggests that vendors with lower sales tend to maintain higher profit margins, potentially due to premium pricing or lower operational costs.
For High-Performing Vendors: If they aim to improve profitability, they could explore selective price adjustments, cost optimization, or bundling strategies.
For Low-Performing Vendors: Despite higher margins, their low sales volume might indicate a need for better marketing, competitive pricing, or improved distribution strategies.

5.0.2 Is there a significant difference in profit margins between top-performing and low-performing vendors?

Hypothesis:

H₀ (Null Hypothesis): There is no significant difference in the mean profit margins of top-performing and low-performing vendors.

H₁ (Alternative Hypothesis): The mean profit margins of top-performing and low-performing vendors are significantly different.

Code

top_threshold = df["TotalSalesDollars"].quantile(0.75)
low_threshold = df["TotalSalesDollars"].quantile(0.25)

top_vendors = df[df["TotalSalesDollars"] >= top_threshold]["ProfitMargin"].dropna()
low_vendors = df[df["TotalSalesDollars"] <= low_threshold]["ProfitMargin"].dropna()

# Perform Two-Sample T-Test
t_stat, p_value = ttest_ind(top_vendors, low_vendors, equal_var=False)

# Print results
print(f"T-Statistic: {t_stat:.4f}, P-Value: {p_value:.4f}")
if p_value < 0.05:
    print("Reject H₀: There is a significant difference in profit margins between top and low-performing vendors.")
else:
    print("Fail to Reject H₀: No significant difference in profit margins.")

T-Statistic: -17.6440, P-Value: 0.0000
Reject H₀: There is a significant difference in profit margins between top and low-performing vendors.

A p-value this small suggests that the difference is not just statistically significant but also practically meaningful.
The two vendor groups operate very differently in terms of profitability.