Overall Car Performance Analysis Using R and Python

 By Shivathmica

The mtcars dataset's goal is to investigate the performance of various cars.
The collection includes details on 32 distinct automobiles, such as their weight, size, horsepower, fuel economy, and other performance specs.

Linear Regression:
A straight line is used to depict the relationship between variables in a regression model called linear regression.
In this example, regression analysis was used to find out whether the car's weight will have impact on the miles per (gallon) it can travel.
Code:
library(readxl)
mtcars <- read_excel("C:/Users/HP/Desktop/mtcars.xlsx", col_names = TRUE)
x<-mtcars$wt
y<-mtcars$mpg
n<- nrow (mtcars)
xmean <- mean(mtcars$wt)
ymean <- mean(mtcars$mpg)
xiyi <- x * y
numerator <- sum(xiyi) - n* xmean * ymean
denominator <- sum(x^2)- n* (xmean^2)
bl <- numerator / denominator
b0 <- ymean - bl * xmean
bl
b0
model_mtcars <- lm(mtcars$wt ~ mtcars$mpg)
model_mtcars
call:
lm(formula = mtcars$wt ~ mtcars$mpg)
coefficients:
(Intercept) mtcars$mpg
summary(model_mtcars)
plot(mtcars$wt, mtcars$mpg, col="blue", main="Linear Regression",
abline(lm(mtcars$mpg ~ mtcars$wt)), cex=1.3, pch=16,
xlab = "Weight of cars", ylab = "Miles per gallon")
df.residual(model_mtcars)
pred_mtcars <- predict(model_mtcars)
pred_mtcars
Resmtcars <- resid(model_mtcars)
Resmtcars

Output:


  • The value of the dependent variable falls by around 0.14086 units for every unit rise in "mpg," while holding "wt" constant. This means that, under the assumption that the car's weight ("wt") stays constant, the value of the outcome variable drops as a car becomes more fuel-efficient.

ANALYSIS USING PYTHON:

Code:
import pandas as pd
from sklearn.linear_model import LinearRegression
df = pd.read_excel('C:/Users/HP/Desktop/mtcars.xlsx') # Load the data into a pandas DataFrame
X = df[['wt']] # predictor variable
y = df['mpg'] # response variable
model = LinearRegression() # Create an instance of the LinearRegression class
model.fit(X, y) # Fit the model to the data
print(model.coef_) # Print the coefficients of the model
print(model.intercept_)

Output:


The coefficient -5.79981683 indicates that, on average, the mpg (miles per gallon) drops by roughly 5.7998 units for every unit of weight that is added. In other words, a car's fuel economy tends to decline as its weight rises.


2. Cluster Analysis (k-means clustering):
Using this dataset, cluster analysis was done to find out distinct group of vehicles based on their performance and characteristics.

Code:
library(readxl)
library(factoextra)
mtcars <- read_excel("C:/Users/HP/Desktop/mtcars.xlsx", col_names = TRUE)
rownames(mtcars) <- mtcars[,1]
mtcars <- mtcars[,-1]
fit <- kmeans(mtcars, centers = 3)
fviz_cluster(fit, data = mtcars, geom = "point", stand = FALSE,
ellipse.type = "convex", ellipse.level = 0.95,
ggtheme = theme_classic())
# Access the cluster centers
cluster_centers <- fit$centers
# Print the cluster centers
print(cluster_centers)

Output:


Based on the central characteristics of each cluster,
Cluster 1: Heavy, strong vehicles with a moderate fuel efficiency.
Cluster 2: Automobiles with automated transmissions that are lighter and more fuel-efficient.
Cluster 3: Heavy, powerful vehicles with manual transmissions and rather poor fuel efficiency.

ANALYSIS USING PYTHON:

Code:

import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
df = pd.read_excel('C:/Users/HP/Desktop/mtcars.xlsx')
X = df[['mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']]
kmeans = KMeans(n_clusters=3)
kmeans.fit(X)
print(kmeans.cluster_centers_)
plt.scatter(X.iloc[:, 0], X.iloc[:, 1], c=kmeans.labels_, cmap='rainbow')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color='black')
plt.show()

Output:

  • The first cluster, denoted by blue dots, has low mpg and high cyl, indicating that the vehicles in this cluster have both a large number of cylinders and low fuel efficiency.
  • The second cluster, which is symbolized by green dots, has a moderate mpg and a moderate cyl, the vehicles in this cluster have a moderate fuel economy and a moderate number of cylinders.
  • The third cluster, denoted by red dots, has high mpg and low cyl, indicating that the vehicles in this cluster have great fuel efficiency and few cylinders.


References:
Acharya, S. (2018). Data Analytics using R. McGraw Hill Education; First Edition.

Comments