This report provides a comprehensive analysis of vendor performance using sales, purchase, and inventory data. It covers exploratory data analysis, key performance indicators, vendor and brand comparisons, inventory turnover, capital lock-in, and statistical testing of profit margins. The insights aim to support data-driven decisions for procurement, pricing, and inventory management.
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns import warningsimport sqlite3from scipy.stats import ttest_indimport scipy.stats as statssns.set(style="whitegrid")warnings.filterwarnings('ignore')
1.2 Loading the Dataset
Code
# creating database connectionconn = sqlite3.connect('D:\Vendor_Analysis_Project\End to End Vendor Analysis Project\database_file\inventory.db')# List all tables in the databasetables = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", conn)print("Available tables in the database:")print(tables)# If the table exists, fetch data; otherwise, print a warningifnot tables.empty and'vendor_sales_summary'in tables['name'].values: df = pd.read_sql_query("SELECT * FROM vendor_sales_summary", conn) display(df.head())else:print("Table 'vendor_sales_summary' does not exist in the database. Please check the table names above.")
Available tables in the database:
name
0 begin_inventory
1 end_inventory
2 purchases
3 purchase_prices
4 sales
5 vendor_invoice
6 vendor_sales_summary
VendorNumber
VendorName
Brand
Description
ActualPrice
Volume
TotalPurchaseQuantity
TotalPurchaseDollars
TotalSalesQuantity
TotalSalesDollars
TotalSalesPrice
TotalExciseTax
FreightCost
GrossProfit
ProfitMargin
StockTurnover
SalesToPurchaseRatio
0
1128
BROWN-FORMAN CORP
1233
Jack Daniels No 7 Black
36.99
1750.0
145080
3811251.60
142049.0
5.101920e+06
672819.31
260999.20
68601.68
1290667.91
25.297693
0.979108
1.338647
1
4425
MARTIGNETTI COMPANIES
3405
Tito's Handmade Vodka
28.99
1750.0
164038
3804041.22
160247.0
4.819073e+06
561512.37
294438.66
144929.24
1015032.27
21.062810
0.976890
1.266830
2
17035
PERNOD RICARD USA
8068
Absolut 80 Proof
24.99
1750.0
187407
3418303.68
187140.0
4.538121e+06
461140.15
343854.07
123780.22
1119816.92
24.675786
0.998575
1.327594
3
3960
DIAGEO NORTH AMERICA INC
4261
Capt Morgan Spiced Rum
22.99
1750.0
201682
3261197.94
200412.0
4.475973e+06
420050.01
368242.80
257032.07
1214774.94
27.139908
0.993703
1.372493
4
3960
DIAGEO NORTH AMERICA INC
3545
Ketel One Vodka
29.99
1750.0
138109
3023206.01
135838.0
4.223108e+06
545778.28
249587.83
257032.07
1199901.61
28.412764
0.983556
1.396897
2Exploratory Data Analysis
Previously, we examined the various tables in the datbase to identify key variables,understand their relationships, and determine which ones should be included in the final analysis.
In this pase of EDA, we will analyze the resultant table to gain insights into the distributuion of each column.This will help us understand data patterns,identify anomalies,and ensure data quality before proceeding with further analysis.
#Distribution plot for Numerical columnsnumerical_cols=df.select_dtypes(include=np.number).columnsplt.figure(figsize=(15,10))for i ,col inenumerate(numerical_cols): plt.subplot(4,4,i+1) # Adjust grid layout as needed sns.histplot(df[col],kde=True,bins=30) plt.title(col)plt.tight_layout()plt.show()
2.5Outliers Detection of Numerical columns
Code
# Outliers Detection with Boxplotsplt.figure(figsize=(15,10))for i,col inenumerate(numerical_cols): plt.subplot(4,4,i+1) sns.boxplot(y=df[col]) plt.title(col)plt.tight_layout()plt.show()
2.6 Summary Statistics Insights:
Negative & Zero Values:
Gross Profit: Minimum value is -52,002.78, indicating losses. Some products or transactions may be selling at a loss due to high costs or selling at discounts lower than the purchase price..
Profit Margin: Has a minimum of -∞, which suggests cases where revenue is zero or even lower than costs.
Total Sales Quantity & Sales Dollars: Minimum values are 0, meaning some products were purchased but never sold. These could be slow-moving or obsolete stock.
Outliers Indicated by High Standard Deviations:
Purchase & Actual Prices: The max values (5,681.81 & 7,499.99) are significantly higher than the mean (24.39 & 35.64), indicating potential premium products.
Freight Cost: Huge variation, from 0.09 to 257,032.07, suggests logistics inefficiencies or bulk shipments.
Stock Turnover: Ranges from 0 to 274.5, implying some products sell extremely fast while others remain in stock indefinitely. Value more than 1 indicates that Sold quantity for that product is higher than purchased quantity due to either sales are being fulfilled from older stock.
2.7Filtering the Data by removing inconsistencies
Code
# Let's filter the data by incosistencies by removing all the data that is not required.df=pd.read_sql_query("""Select *from vendor_sales_summarywhere GrossProfit > 0and ProfitMargin > 0and TotalSalesQuantity > 0 """,conn)
Code
df
VendorNumber
VendorName
Brand
Description
ActualPrice
Volume
TotalPurchaseQuantity
TotalPurchaseDollars
TotalSalesQuantity
TotalSalesDollars
TotalSalesPrice
TotalExciseTax
FreightCost
GrossProfit
ProfitMargin
StockTurnover
SalesToPurchaseRatio
0
1128
BROWN-FORMAN CORP
1233
Jack Daniels No 7 Black
36.99
1750.0
145080
3811251.60
142049.0
5.101920e+06
672819.31
260999.20
68601.68
1290667.91
25.297693
0.979108
1.338647
1
4425
MARTIGNETTI COMPANIES
3405
Tito's Handmade Vodka
28.99
1750.0
164038
3804041.22
160247.0
4.819073e+06
561512.37
294438.66
144929.24
1015032.27
21.062810
0.976890
1.266830
2
17035
PERNOD RICARD USA
8068
Absolut 80 Proof
24.99
1750.0
187407
3418303.68
187140.0
4.538121e+06
461140.15
343854.07
123780.22
1119816.92
24.675786
0.998575
1.327594
3
3960
DIAGEO NORTH AMERICA INC
4261
Capt Morgan Spiced Rum
22.99
1750.0
201682
3261197.94
200412.0
4.475973e+06
420050.01
368242.80
257032.07
1214774.94
27.139908
0.993703
1.372493
4
3960
DIAGEO NORTH AMERICA INC
3545
Ketel One Vodka
29.99
1750.0
138109
3023206.01
135838.0
4.223108e+06
545778.28
249587.83
257032.07
1199901.61
28.412764
0.983556
1.396897
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
8560
9815
WINE GROUP INC
8527
Concannon Glen Ellen Wh Zin
4.99
750.0
2
2.64
5.0
1.595000e+01
10.96
0.55
27100.41
13.31
83.448276
2.500000
6.041667
8561
8004
SAZERAC CO INC
5683
Dr McGillicuddy's Apple Pie
0.49
50.0
6
2.34
134.0
6.566000e+01
1.47
7.04
50293.62
63.32
96.436186
22.333333
28.059829
8562
3924
HEAVEN HILL DISTILLERIES
9123
Deep Eddy Vodka
0.99
50.0
2
1.48
2.0
1.980000e+00
0.99
0.10
14069.87
0.50
25.252525
1.000000
1.337838
8563
3960
DIAGEO NORTH AMERICA INC
6127
The Club Strawbry Margarita
1.99
200.0
1
1.47
72.0
1.432800e+02
77.61
15.12
257032.07
141.81
98.974037
72.000000
97.469388
8564
7245
PROXIMO SPIRITS INC.
3065
Three Olives Grape Vodka
0.99
50.0
1
0.71
86.0
8.514000e+01
33.66
4.46
38994.78
84.43
99.166079
86.000000
119.915493
8565 rows × 17 columns
2.8Count Plots for Categorical Columns
Code
# Count Plots for Categorical Columnscategorical_cols=["VendorName","Description"]plt.figure(figsize=(12,5))for i,col inenumerate(categorical_cols): plt.subplot(1,2,i+1) sns.countplot(y=df[col], order=df[col].value_counts().index[:10]) # Top 10 categories plt.title(f"Count Plot of {col}")plt.tight_layout()plt.show()
PurchasePrice has weak correlations with TotalSalesDollars (-0.012) and GrossProfit (-0.016), suggesting that price variations do not significantly impact sales revenue or profit.
Strong correlation between total purchase quantity and total sales quantity (0.999), confirming efficient inventory turnover.
Negative correlation between profit margin & total sales price (-0.179) suggests that as sales price increases, margins decrease, possibly due to competitive pricing pressures.
StockTurnover has weak negative correlations with both GrossProfit (-0.038) and ProfitMargin (-0.055), indicating that faster turnover does not necessarily result in higher profitability.
4Data Analysis
4.0.1Identify Brands that needs Promotional or Pricing Adjustments which exhibit lower sales performance but higher profit margins.
print("Brands with Low Sales but High Profit Margins:")# threshold for low_sales (bottom 15 %) and "high margin" (top 15%)low_sales_threshold=brand_performance['TotalSalesDollars'].quantile(0.15)high_margin_threshold=brand_performance['ProfitMargin'].quantile(0.85)# Filter brands with low sales but high profit marginstarget_brands=brand_performance[ (brand_performance['TotalSalesDollars'] <= low_sales_threshold) & (brand_performance['ProfitMargin'] >= high_margin_threshold)]display(target_brands.sort_values('TotalSalesDollars'))
Brands with Low Sales but High Profit Margins:
Description
TotalSalesDollars
ProfitMargin
6199
Santa Rita Organic Svgn Bl
9.99
66.466466
2369
Debauchery Pnt Nr
11.58
65.975820
2070
Concannon Glen Ellen Wh Zin
15.95
83.448276
2188
Crown Royal Apple
27.86
89.806174
6237
Sauza Sprklg Wild Berry Marg
27.96
82.153076
...
...
...
...
5074
Nanbu Bijin Southern Beauty
535.68
76.747312
2271
Dad's Hat Rye Whiskey
538.89
81.851584
57
A Bichot Clos Marechaudes
539.94
67.740860
6245
Sbragia Home Ranch Merlot
549.75
66.444748
3326
Goulee Cos d'Estournel 10
558.87
69.434752
198 rows × 3 columns
Code
# filtering all the brand which have sales < 1000 in TotalSalesDollars columnbrand_performance = brand_performance[brand_performance['TotalSalesDollars']<10000] # for better visualization
4.0.2Scatterplot to find Brand values b/w Total Sales and Profit Margin(%)
Code
# Scatterplot to find Brand values b/w Total Sales and Profit Margin(%)plt.figure(figsize=(10, 6))sns.scatterplot(data=brand_performance,x='TotalSalesDollars',y='ProfitMargin',color='blue',label='All Brands',alpha=0.2)sns.scatterplot(data=target_brands,x='TotalSalesDollars',y='ProfitMargin',color='red',label='Target Brands')# line to show the 'low_sale_threshold' and 'high_margin_threshold'plt.axhline(high_margin_threshold,linestyle='--',color='black',label="High Margin Threshold")plt.axvline(low_sales_threshold,linestyle='--',color='black',label="Low Sales Threshold")plt.xlabel("Total Sales ($)")plt.ylabel("Profit Margin(%)")plt.title("Brands for Promotional or Pricing Adjustment")plt.legend()plt.grid(True)plt.show()
4.0.3Which vendors and brands demonstrate the highest sales performance
Code
def format_dollars(value):if value >=1_000_000:returnf"{value/1_000_000:.2f}M"elif value >=1_000:returnf"{value/1_000:.2f}K"else :returnstr(value)
Code
# Top Vendors & Brands by Sales Performancetop_vendors=df.groupby("VendorName")["TotalSalesDollars"].sum().nlargest(10)top_brands=df.groupby("Description")["TotalSalesDollars"].sum().nlargest(10)top_vendors
VendorName
DIAGEO NORTH AMERICA INC 6.799010e+07
MARTIGNETTI COMPANIES 3.933036e+07
PERNOD RICARD USA 3.206320e+07
JIM BEAM BRANDS COMPANY 3.142302e+07
BACARDI USA INC 2.485482e+07
CONSTELLATION BRANDS INC 2.421875e+07
E & J GALLO WINERY 1.839990e+07
BROWN-FORMAN CORP 1.824723e+07
ULTRA BEVERAGE COMPANY LLP 1.650254e+07
M S WALKER INC 1.470646e+07
Name: TotalSalesDollars, dtype: float64
Code
top_brands
Description
Jack Daniels No 7 Black 7964746.76
Tito's Handmade Vodka 7399657.58
Grey Goose Vodka 7209608.06
Capt Morgan Spiced Rum 6356320.62
Absolut 80 Proof 6244752.03
Jameson Irish Whiskey 5715759.69
Ketel One Vodka 5070083.56
Baileys Irish Cream 4150122.07
Kahlua 3604858.66
Tanqueray 3456697.90
Name: TotalSalesDollars, dtype: float64
Code
top_vendors.apply(lambda x:format_dollars(x))
VendorName
DIAGEO NORTH AMERICA INC 67.99M
MARTIGNETTI COMPANIES 39.33M
PERNOD RICARD USA 32.06M
JIM BEAM BRANDS COMPANY 31.42M
BACARDI USA INC 24.85M
CONSTELLATION BRANDS INC 24.22M
E & J GALLO WINERY 18.40M
BROWN-FORMAN CORP 18.25M
ULTRA BEVERAGE COMPANY LLP 16.50M
M S WALKER INC 14.71M
Name: TotalSalesDollars, dtype: object
Code
top_brands.apply(lambda x:format_dollars(x))
Description
Jack Daniels No 7 Black 7.96M
Tito's Handmade Vodka 7.40M
Grey Goose Vodka 7.21M
Capt Morgan Spiced Rum 6.36M
Absolut 80 Proof 6.24M
Jameson Irish Whiskey 5.72M
Ketel One Vodka 5.07M
Baileys Irish Cream 4.15M
Kahlua 3.60M
Tanqueray 3.46M
Name: TotalSalesDollars, dtype: object
4.0.4Top 10 Vendors and Brands
Code
plt.figure(figsize=(15,5))# plot for Top Vendorsplt.subplot(1,2,1)ax1=sns.barplot(y=top_vendors.index,x=top_vendors.values,palette="Blues_r")plt.title("Top 10 Vendors by Sales")plt.grid(False)for bar in ax1.patches: ax1.text(bar.get_width() + (bar.get_width() *0.02), bar.get_y() + bar.get_height() /2, format_dollars(bar.get_width()), ha='left', va='center', fontsize=10, color='black')# plot for Top Brandsplt.subplot(1,2,2)ax2=sns.barplot(y=top_brands.index.astype(str),x=top_brands.values,palette="Reds_r")plt.title("Top 10 Brands by Sales")for bar in ax2.patches: ax2.text(bar.get_width() + (bar.get_width() *0.02), bar.get_y() + bar.get_height() /2, format_dollars(bar.get_width()), ha='left', va='center', fontsize=10, color='black')plt.tight_layout()plt.grid(False)plt.show()
4.0.5Which vendors contribute the most to total Purchase dollars?
Code
# Rank Vendors by Total Purchase Dollarsvendor_performance=df.groupby("VendorName").agg({"TotalPurchaseDollars":"sum","GrossProfit":"sum","TotalSalesDollars":"sum"}).reset_index()# Calculate Contribution % to Overall Procurementvendor_performance["Purchase_Contribution%"]= (vendor_performance["TotalPurchaseDollars"]/vendor_performance["TotalPurchaseDollars"].sum()) *100# Rank Vendors by Total Purchase Dollars & Profitabilityvendor_performance=round(vendor_performance.sort_values(by="TotalPurchaseDollars",ascending=False),2)# Display Top 10 Vendorstop_vendors=vendor_performance.head(10)top_vendors['TotalSalesDollars']=top_vendors['TotalSalesDollars'].apply(format_dollars)top_vendors['TotalPurchaseDollars']=top_vendors['TotalPurchaseDollars'].apply(format_dollars)top_vendors['GrossProfit']=top_vendors['GrossProfit'].apply(format_dollars)top_vendors
VendorName
TotalPurchaseDollars
GrossProfit
TotalSalesDollars
Purchase_Contribution%
25
DIAGEO NORTH AMERICA INC
50.10M
17.89M
67.99M
16.30
57
MARTIGNETTI COMPANIES
25.50M
13.83M
39.33M
8.30
68
PERNOD RICARD USA
23.85M
8.21M
32.06M
7.76
46
JIM BEAM BRANDS COMPANY
23.49M
7.93M
31.42M
7.64
6
BACARDI USA INC
17.43M
7.42M
24.85M
5.67
20
CONSTELLATION BRANDS INC
15.27M
8.95M
24.22M
4.97
11
BROWN-FORMAN CORP
13.24M
5.01M
18.25M
4.31
30
E & J GALLO WINERY
12.07M
6.33M
18.40M
3.93
106
ULTRA BEVERAGE COMPANY LLP
11.17M
5.34M
16.50M
3.63
53
M S WALKER INC
9.76M
4.94M
14.71M
3.18
4.0.6A Pareto Chart to show bar chart of “Purchase contributions%’ column along with the cummulative sum
Code
# Created A Pareto Chart to show bar chart of "Purchase contributions%' column along with the cummulative sum# Created a new 'Cummulative Contributions" columnstop_vendors['Cumulative_Contributions%']=top_vendors['Purchase_Contribution%'].cumsum()fig,ax1=plt.subplots(figsize=(15,10))# Barplot for Purchase Contributions%sns.barplot(x=top_vendors['VendorName'],y=top_vendors['Purchase_Contribution%'],palette="mako",ax=ax1)for i , value inenumerate(top_vendors['Purchase_Contribution%']): ax1.text(i,value-1,str(value)+'%',ha='center',fontsize=10,color='white')# Line Plot for Cumulative Contribution%ax2=ax1.twinx()ax2.plot(top_vendors['VendorName'],top_vendors['Cumulative_Contributions%'],color='red',marker='o',linestyle='dashed',label='Cumulative %')ax1.set_xticklabels(top_vendors['VendorName'],rotation=90)ax1.set_ylabel('Purchase Contirbution %',color='blue')ax2.set_ylabel('Cumulative Contributions %',color='red')ax1.set_xlabel('Vendors')ax1.set_title('Pareto Chart: Vendor Contribution to Total Purchases')ax2.axhline(y=100,color='gray',linestyle='dashed',alpha=0.7)ax2.legend(loc='upper right')ax1.grid(False)ax2.grid(False)plt.show()
4.0.7How much of total procurement is dependent on the top vendors?
Code
# Total Purchase Contribution by top 10 vendorsprint(f"Total Purchase Contribution of top 10 vendors is {round(top_vendors['Purchase_Contribution%'].sum(),2)}% .")vendors=list(top_vendors['VendorName'].values)purchase_contributions=list(top_vendors['Purchase_Contribution%'].values)total_contribution=sum(purchase_contributions)remaining_contribution=100- total_contribution# Append "Other Vendors" Categoryvendors.append("Other Vendors")purchase_contributions.append(remaining_contribution)# Donut Chart fig,ax=plt.subplots(figsize=(8,8))wedges,texts,autotexts=ax.pie(purchase_contributions,labels=vendors,autopct='%1.1f%%',startangle=140,pctdistance=0.85,colors=plt.cm.Paired.colors)# Add Total Contribution annotation in the Centercentre_circle=plt.Circle((0,0),0.70,fc='white')fig.gca().add_artist(centre_circle)# Add Total Contributions annotation in the centerplt.text(0,0,f"Top 10 Total:\n{total_contribution:.2f}%",fontsize=14,fontweight='bold',ha='center',va='center')plt.title("Top 10 Vendor's Purchase Contribution (%)")plt.show()
Total Purchase Contribution of top 10 vendors is 65.69% .
The remaining vendors contribute only 34.31%, meaning they are not utilized effectively or may not be as competitive. If vendor dependency is too high, consider identifying new suppliers to reduce risk.
4.0.8Does purchasing in bulk reduce the unit price, and what is the optimal purchase volume for cost savings?
Code
# Calculate Unit Purchase Price Per Vendor & Volume Groupdf["UnitPurchasePrice"]=df["TotalPurchaseDollars"]/df['TotalPurchaseQuantity']# Group by Order Sizes (e.g., Small,Medium,Large Purchases)df['OrderSize']=pd.qcut(df['TotalPurchaseQuantity'],q=3,labels=["Small","Medium","Large"])# Analyze the Cost Savings Per Order Sizebulk_purchase_analysis=df.groupby("OrderSize")["UnitPurchasePrice"].mean().reset_index()print(bulk_purchase_analysis)
OrderSize UnitPurchasePrice
0 Small 39.057543
1 Medium 15.486414
2 Large 10.777625
Code
# Boxplot plt.figure(figsize=(10, 6))sns.boxplot(data=df, x="OrderSize", y="UnitPurchasePrice", palette="Set2")plt.title("Impact of Bulk Purchasing on Unit Price")plt.xlabel("Order Size")plt.ylabel("Average Unit Purchase Price")plt.show()
Vendors buying in bulk (Large Order Size) get the lowest unit price ($10.78 per unit), meaning higher margins if they can manage inventory efficiently.
The price difference between Small and Large orders is substantial (~72% reduction in unit cost)
This suggests that bulk pricing strategies successfully encourage vendors to purchase in larger volumes, leading to higher overall sales despite lower per-unit revenue.
4.0.9Which vendors have low inventory turnover, indicating excess stock and slow-moving products?
The confidence interval for low-performing vendors (40.48% to 42.62%) is significantly higher than that of top-performing vendors (30.74% to 31.61%).
This suggests that vendors with lower sales tend to maintain higher profit margins, potentially due to premium pricing or lower operational costs.
For High-Performing Vendors: If they aim to improve profitability, they could explore selective price adjustments, cost optimization, or bundling strategies.
For Low-Performing Vendors: Despite higher margins, their low sales volume might indicate a need for better marketing, competitive pricing, or improved distribution strategies.
5.0.2Is there a significant difference in profit margins between top-performing and low-performing vendors?
Hypothesis:
H₀ (Null Hypothesis): There is no significant difference in the mean profit margins of top-performing and low-performing vendors.
H₁ (Alternative Hypothesis): The mean profit margins of top-performing and low-performing vendors are significantly different.
Code
top_threshold = df["TotalSalesDollars"].quantile(0.75)low_threshold = df["TotalSalesDollars"].quantile(0.25)top_vendors = df[df["TotalSalesDollars"] >= top_threshold]["ProfitMargin"].dropna()low_vendors = df[df["TotalSalesDollars"] <= low_threshold]["ProfitMargin"].dropna()# Perform Two-Sample T-Testt_stat, p_value = ttest_ind(top_vendors, low_vendors, equal_var=False)# Print resultsprint(f"T-Statistic: {t_stat:.4f}, P-Value: {p_value:.4f}")if p_value <0.05:print("Reject H₀: There is a significant difference in profit margins between top and low-performing vendors.")else:print("Fail to Reject H₀: No significant difference in profit margins.")
T-Statistic: -17.6440, P-Value: 0.0000
Reject H₀: There is a significant difference in profit margins between top and low-performing vendors.
A p-value this small suggests that the difference is not just statistically significant but also practically meaningful.
The two vendor groups operate very differently in terms of profitability.