# package loading
library(dplyr)
library(tidyverse)
library(ggplot2)
library(scales)
library(lubridate)
library(plotly)
library(htmlwidgets)
library(tseries)
library(forcats)
library(forecast)
library(DT)
library(readr)
Bangalore, known for its rapid urbanization and expanding population, faces significant challenges with traffic congestion. This project aims to analyze traffic patterns in Bangalore with a focus on how weekdays and climatic changes influence average traffic speed and congestion levels. By leveraging historical traffic data and weather conditions, the study examines the interplay between climate variables (such as rainfall, temperature, and humidity) and traffic flow dynamics.
The project seeks to:
Evaluate the impact of climate changes on average traffic speed and congestion during weekdays.
Identify critical weather conditions that exacerbate traffic delays.
Predict future average traffic patterns over a month using ARIMA (Auto-Regressive Integrated Moving Average) modeling.
The analysis integrates statistical and machine learning techniques to understand variability in traffic across different weather scenarios and weekdays. ARIMA modeling will be employed to forecast traffic trends, providing actionable insights for urban planners and policymakers to devise traffic management strategies.
This study not only contributes to understanding the implications of climate change on urban traffic but also supports the development of adaptive measures to mitigate congestion, ensuring smoother traffic flow in Bangalore’s growing metropolis.
The data is secondary dataset downloaded in kaggle.com website
Converting the respectives data into Date types , catagorical Types and Numerical types
Example dataset:
datam=read.csv("Banglore_traffic_Dataset.csv")
data=datam
data$Date=as.Date(datam$Date,formate="%Y-%m-%d")
data$Average.Speed=data$Average.Speed %>% round(2)
data$Travel.Time.Index=data$Travel.Time.Index %>% round(2)
data$Congestion.Level=data$Congestion.Level %>% round(2)
data$Road.Capacity.Utilization=data$Road.Capacity.Utilization %>% round(2)
data$Environmental.Impact=data$Environmental.Impact %>% round(2)
data$Public.Transport.Usage=data$Public.Transport.Usage %>% round(2)
data$Traffic.Signal.Compliance=data$Traffic.Signal.Compliance %>% round(2)
data$Parking.Usage=data$Parking.Usage %>% round(2)
data$Area.Name=data$Area.Name %>% as.factor()
data$Road.Intersection.Name=data$Road.Intersection.Name %>% as.factor()
data$Weather.Conditions=data$Weather.Conditions %>% as.factor()
data$Roadwork.and.Construction.Activity=data$Roadwork.and.Construction.Activity %>% as.factor()
data$Days=weekdays(data$Date) %>% as.factor()
data$month=month(data$Date)|>as.factor()
levels(data$month)=month.abb
data$year=year(data$Date)
data %>% datatable(filter="top")
In Bangalore ,Traffic Volume can change due to several factors like weekdays,month,and even based on weather also.
Comparing The Traffic Volume and its Congustion Level with Several Factors
Grouping the data in week bases and creating the barchart to visually see the Result
graph1=data %>% group_by(Days) %>% summarise(Average_Traffic=mean(Traffic.Volume)) %>%
ggplot(aes(x=factor(Days,levels = c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday")),y=Average_Traffic,fill=Days))+
geom_bar(stat="identity")+geom_text(aes(label=round(Average_Traffic,0)),vjust=2)+labs(
title="WEEKDAYS Vs TRAFFIC",
y="TRAFFIC VOLUME(AVERAGE)",
x="WEEKDAYS",
caption = "It is the bar-diagram of Average traffic volume in Bangalure based on weekdays ")+
theme(plot.title = element_text(face = "bold",hjust=0.5),
legend.position = "None",
plot.caption = element_text(face="bold"))
ggplotly(graph1)
From the above chart We can see that Average Traffic Volume is High on wednesday
Grouping the data in Month bases and creating the barchart to visually see the Result
graph2=data %>% group_by(month) %>% summarise(Average_Traffic=mean(Traffic.Volume)) %>%
ggplot(aes(x=factor(month,levels=month.abb),y=Average_Traffic,fill=month))+
geom_bar(stat="identity")+geom_text(aes(label=round(Average_Traffic,0)),vjust=1.5)+labs(
title="MONTH Vs TRAFFIC",
y="TRAFFIC VOLUME(AVERAGE)",
x="Month",
caption = "It is the bar-diagram of Average traffic volume in Bangalure based on Months ")+
theme(plot.title = element_text(face = "bold",hjust=0.5),
legend.position = "None",
plot.caption = element_text(face="bold"))
ggplotly(graph2)
From the above chart We can see that Average Traffic Volume is High on June
graph3=data %>% ggplot(aes(x=Average.Speed,y=Congestion.Level))+geom_point(shape=1)
ggplotly(graph3)
## Average.Speed Congestion.Level
## Average.Speed 1.00 -0.36
## Congestion.Level -0.36 1.00
The correlation between Average Speed and Congestion level is -0.36 It’s clearly indicates that when the Average speed is increases the congestion level will decreases
k=data %>% group_by(year,month) %>% summarise(Speed=mean(Average.Speed),Congestion=mean(Congestion.Level))
graph4=k %>% ggplot(aes(x=Speed,y=Congestion))+geom_smooth(se=F)+geom_point()+labs(title = "Speed Vs Congestion")+theme(plot.title = element_text(hjust = 0.5,face="bold"))
graph4 %>% ggplotly()
## Speed Congestion
## Speed 1.0000000 -0.3476322
## Congestion -0.3476322 1.0000000
The correlation between Average Speed and Congestion is -0.34 It’s clearly indicates that when the Average speed is increases the congestion will decreases
graph5=data %>% ggplot(aes(x=Average.Speed,y=Traffic.Volume,colour =Weather.Conditions))+
facet_wrap(~Weather.Conditions)+geom_point()+
labs(title ="Average Speed Vs Traffic Volume",
subtitle = "Based on Weather Condtion ",
x="Average Speed",
y=" Traffic Volume")+
theme(legend.position = "none",
plot.title = element_text(face="bold",hjust=0.5),
plot.subtitle = element_text(hjust=0.5)
)
ggplotly(graph5)
The Above Graph clearly indiacates that when the speed decreases the traffic volume will increases. And Weather is also one term which affects the Average speed causes to increase the Traffic Volume
Converting the data into time series data and predicting the Future Traffic Values By using The ARIMA function
data1=data %>%
subset(select=c(year,month,Traffic.Volume,Environmental.Impact,Public.Transport.Usage)) %>%
group_by(year,month) %>%summarise(Traffic_volume=sum(Traffic.Volume),
Environmental_Impact=sum(Environmental.Impact),
Public_Transport_usage=sum(Public.Transport.Usage))
data1$sno=1:32
data1 %>% datatable(filter="top")
ts_data=data1[1:31,] %>% subset(select=c(year,month,Traffic_volume))
ts_data$sn01=1:31
datatime=ts(ts_data$Traffic_volume,start=min(ts_data$sn01),end=max(ts_data$sn01),frequency = 12)
Using the Augmented Dickey-Fuller Test function to check the stationary of the data
##
## Augmented Dickey-Fuller Test
##
## data: datatime
## Dickey-Fuller = -11.841, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary
The Calculated P-value is lesser that 0.05 so , the data is in Stationary
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,0,2)(1,0,1)[12] with non-zero mean : 10283.13
## ARIMA(0,0,0) with non-zero mean : 10388.96
## ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 10302.25
## ARIMA(0,0,1)(0,0,1)[12] with non-zero mean : 10319.77
## ARIMA(0,0,0) with zero mean : 12537.87
## ARIMA(2,0,2)(0,0,1)[12] with non-zero mean : 10325.84
## ARIMA(2,0,2)(1,0,0)[12] with non-zero mean : 10280.52
## ARIMA(2,0,2) with non-zero mean : 10355
## ARIMA(2,0,2)(2,0,0)[12] with non-zero mean : 10275.61
## ARIMA(2,0,2)(2,0,1)[12] with non-zero mean : 10274.92
## ARIMA(2,0,2)(2,0,2)[12] with non-zero mean : 10280.32
## ARIMA(2,0,2)(1,0,2)[12] with non-zero mean : 10280.77
## ARIMA(1,0,2)(2,0,1)[12] with non-zero mean : 10314.81
## ARIMA(2,0,1)(2,0,1)[12] with non-zero mean : 10317.97
## ARIMA(3,0,2)(2,0,1)[12] with non-zero mean : Inf
## ARIMA(2,0,3)(2,0,1)[12] with non-zero mean : 10314.71
## ARIMA(1,0,1)(2,0,1)[12] with non-zero mean : 10316.81
## ARIMA(1,0,3)(2,0,1)[12] with non-zero mean : 10309.55
## ARIMA(3,0,1)(2,0,1)[12] with non-zero mean : 10288.06
## ARIMA(3,0,3)(2,0,1)[12] with non-zero mean : 10278.29
## ARIMA(2,0,2)(2,0,1)[12] with zero mean : Inf
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(2,0,2)(2,0,1)[12] with non-zero mean : Inf
## ARIMA(2,0,2)(2,0,0)[12] with non-zero mean : Inf
## ARIMA(3,0,3)(2,0,1)[12] with non-zero mean : Inf
## ARIMA(2,0,2)(2,0,2)[12] with non-zero mean : Inf
## ARIMA(2,0,2)(1,0,0)[12] with non-zero mean : Inf
## ARIMA(2,0,2)(1,0,2)[12] with non-zero mean : Inf
## ARIMA(2,0,2)(1,0,1)[12] with non-zero mean : Inf
## ARIMA(3,0,1)(2,0,1)[12] with non-zero mean : Inf
## ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 10305.86
##
## Best model: ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
## Series: datatime
## ARIMA(1,0,0)(1,0,0)[12] with non-zero mean
##
## Coefficients:
## ar1 sar1 mean
## 0.0554 0.5145 8344790.70
## s.e. 0.0608 0.0529 40882.84
##
## sigma^2 = 1.37e+11: log likelihood = -5141.15
## AIC=10290.31 AICc=10290.42 BIC=10305.86
The best ARIMA model for the data is ARIMA(1,0,0) with BIC:10305.86
myf=forecast(model,level=95,h=15)
myfdata=myf %>% as.data.frame()
myfdata %>% round(3) %>% datatable(filter = "top")
plot(myf,xlab="Month (starting form 2022 jan)",ylab="Traffic volume",main="Predicted Traffic Volume(upto 2025 june)")
In future the Traffic volume will increase Rapidlly if we didn`t take any action in it.
The analysis reveals that the average traffic volume in Bangalore is highest on Wednesdays and peaks during the month of June, indicating specific patterns in traffic behavior. It is observed that an increase in average speed leads to a decrease in both congestion levels and overall congestion, emphasizing the importance of smooth traffic flow. Weather also plays a significant role, as adverse conditions can reduce average speed, leading to an increase in traffic volume. If no measures are taken, traffic volume is projected to rise rapidly in the future, highlighting the urgent need for effective traffic management strategies to mitigate these challenges.