Overall Car Performance Analysis Using R and Python

By Shivathmica

The mtcars dataset's goal is to investigate the performance of various cars.

The collection includes details on 32 distinct automobiles, such as their weight, size, horsepower, fuel economy, and other performance specs.

Linear Regression:

A straight line is used to depict the relationship between variables in a regression model called linear regression.

In this example, regression analysis was used to find out whether the car's weight will have impact on the miles per (gallon) it can travel.

Code:

library(readxl)

mtcars <- read_excel("C:/Users/HP/Desktop/mtcars.xlsx", col_names = TRUE)

x<-mtcars$wt

y<-mtcars$mpg

n<- nrow (mtcars)

xmean <- mean(mtcars$wt)

ymean <- mean(mtcars$mpg)

xiyi <- x * y

numerator <- sum(xiyi) - n* xmean * ymean

denominator <- sum(x^2)- n* (xmean^2)

bl <- numerator / denominator

b0 <- ymean - bl * xmean

model_mtcars <- lm(mtcars$wt ~ mtcars$mpg)

model_mtcars

call:

lm(formula = mtcars$wt ~ mtcars$mpg)

coefficients:

(Intercept) mtcars$mpg

summary(model_mtcars)

plot(mtcars$wt, mtcars$mpg, col="blue", main="Linear Regression",

abline(lm(mtcars$mpg ~ mtcars$wt)), cex=1.3, pch=16,

xlab = "Weight of cars", ylab = "Miles per gallon")

df.residual(model_mtcars)

pred_mtcars <- predict(model_mtcars)

pred_mtcars

Resmtcars <- resid(model_mtcars)

Resmtcars

Output:

The value of the dependent variable falls by around 0.14086 units for every unit rise in "mpg," while holding "wt" constant. This means that, under the assumption that the car's weight ("wt") stays constant, the value of the outcome variable drops as a car becomes more fuel-efficient.

ANALYSIS USING PYTHON:

Code:

import pandas as pd

from sklearn.linear_model import LinearRegression

df = pd.read_excel('C:/Users/HP/Desktop/mtcars.xlsx') # Load the data into a pandas DataFrame

X = df[['wt']] # predictor variable

y = df['mpg'] # response variable

model = LinearRegression() # Create an instance of the LinearRegression class

model.fit(X, y) # Fit the model to the data

print(model.coef_) # Print the coefficients of the model

print(model.intercept_)

Output:

The coefficient -5.79981683 indicates that, on average, the mpg (miles per gallon) drops by roughly 5.7998 units for every unit of weight that is added. In other words, a car's fuel economy tends to decline as its weight rises.

2. Cluster Analysis (k-means clustering):

Using this dataset, cluster analysis was done to find out distinct group of vehicles based on their performance and characteristics.

Code:

library(readxl)

library(factoextra)

mtcars <- read_excel("C:/Users/HP/Desktop/mtcars.xlsx", col_names = TRUE)

rownames(mtcars) <- mtcars[,1]

mtcars <- mtcars[,-1]

fit <- kmeans(mtcars, centers = 3)

fviz_cluster(fit, data = mtcars, geom = "point", stand = FALSE,

ellipse.type = "convex", ellipse.level = 0.95,

ggtheme = theme_classic())

# Access the cluster centers

cluster_centers <- fit$centers

# Print the cluster centers

print(cluster_centers)

Output:

Based on the central characteristics of each cluster,

Cluster 1: Heavy, strong vehicles with a moderate fuel efficiency.

Cluster 2: Automobiles with automated transmissions that are lighter and more fuel-efficient.

Cluster 3: Heavy, powerful vehicles with manual transmissions and rather poor fuel efficiency.

ANALYSIS USING PYTHON:

Code:

import pandas as pd

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

df = pd.read_excel('C:/Users/HP/Desktop/mtcars.xlsx')

X = df[['mpg', 'cyl', 'disp', 'hp', 'drat', 'wt', 'qsec', 'vs', 'am', 'gear', 'carb']]

kmeans = KMeans(n_clusters=3)

kmeans.fit(X)

print(kmeans.cluster_centers_)

plt.scatter(X.iloc[:, 0], X.iloc[:, 1], c=kmeans.labels_, cmap='rainbow')

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], color='black')

plt.show()

Output:

The first cluster, denoted by blue dots, has low mpg and high cyl, indicating that the vehicles in this cluster have both a large number of cylinders and low fuel efficiency.
The second cluster, which is symbolized by green dots, has a moderate mpg and a moderate cyl, the vehicles in this cluster have a moderate fuel economy and a moderate number of cylinders.
The third cluster, denoted by red dots, has high mpg and low cyl, indicating that the vehicles in this cluster have great fuel efficiency and few cylinders.

References:

Acharya, S. (2018). Data Analytics using R. McGraw Hill Education; First Edition.

Search This Blog

Overall Car Performance Analysis Using R and Python

Overall Car Performance Analysis Using R and Python

Comments

Post a Comment