Time Series Forecasting Tutorial

Forecasting the Spread of Covid in NYC

( last updated : May 25, 2020 )
Forecasting R ACF PACF ARIMA Differencing Time Series Time Series Decomposition Stationarity


Intro

COVID has been the topic of the day for the last 5 months, and yet, there is so much we don’t know about this virus. Many data scientists are working hard to uncover new patterns and forecast the impact of the virus. I also decided to make my small contribution and put together this tutorial about time series forecasting. Here, I will try to predict the spread of COVID in New York City. The same techniques can be used to forecast any time series, be it sales, demand, call volume, etc.

In this tutorial, I will be working with the data containing net new cases of COVID-19 in New York City to show you how to:

  1. Create a Time Series Object
  2. Plot a time series
  3. Transform and stationarize a time series
  4. Interpret ACF and PACF plots
  5. Forecast a time series using some simple and more sophisticated forecasting models
  6. Evaluate models performance
The data is publicly available on NYC Health public website.

Table of Contents


Data

In this tutorial, I will be working with the data containing net new cases of COVID-19 in New York City available on NYC Health public website. Let's visualize it and take a look at it.

ggplot(data=CasesConfirmed, aes(x=DATE, y=CASE_COUNT))+
  geom_line(size = 0.8, color="#3CAEA3")+
  ylab("Confirmed Cases")+
  xlab("")+
  ggtitle("Confirmed Cases of COVID-19 in NYC, February 29 - May 5")+ 
  theme(plot.title = element_text(hjust = 0.5))

It looks like the first case of coronavirus in NYC was recorded on February 29. The last observation in our dataset goes back to May 5. Note that the number of new cases remained pretty flat for the first week of March. It then spiked rapidly to a little over 6,000 new cases a day in the beginning of April and started winding down after that.

Since the data shows no movement in new cases of coronavirus in the 1st week of March, I will remove the observations before March 8, 2020.


Creating a Time Series Object

Many visualization and forecasting functions in R only work with time series objects. I highly recommend converting your data into a time series object to unlock all that functionality. We can accomplish that with the ts() function. ts() requires specifying the frequency of the time series, which refers to the number of observations before the seasonal pattern repeats. For example, if you are working with monthly sales data, then your frequency would be 12. Our data contains daily observations. Usually, for daily data, the frequency would be either 365 or 7. Since our data covers only a couple of weeks, we will set the frequency argument to 7.

If you are working with high-frequency data, it may get a little tricky. For example, the data that is observed every minute may exhibit several seasonal patterns at the same time (e.g., hourly (60), daily (60x24), and weekly (60x24x7)). I recommend reading about complex seasonality in Rob Hyndman and George Athanasopoulos’ online textbook Forecasting: Principles and Practice.

Here is how you use ts() in R.

## Time Series:
## Start = c(1, 1) 
## End = c(9, 5) 
## Frequency = 7 
##  [1]   21   57   70  153  355  618  642 1028 2116 2447 2950 3678 3985 2600 2545
## [16] 3505 4390 4734 4924 4991 3386 3437 6015 5170 5026 5645 5518 3752 3616 6212
## [31] 5924 5406 4871 4240 3565 2717 3211 4055 3800 3453 3500 2080 2285 3714 3005
## [46] 3419 2786 2399 1516  997 2237 2652 2282 1949 1708 1003  721 1279 1083  604
## [61]    6

This is the structure of a time series object. Note that the ts() function also lets you specify your start and end periods. For example, if you are working with monthly sales data that starts on January 1, 2015, you should set the start argument to c(2015, 1) and frequency to 12.


Decomposing Time Series

Once our data is transformed into a time series object, we can start analyzing it. This usually involves understanding the components of a time series that affect how it fluctuates over time. This is why time series analysis is called decomposition. In this section, we will cover it only at a high level, but we’ll do a deeper dive later on in this tutorial.

A time series usually has one or more of the following components:

  1. Trend: upward or downward movement in observations over time
  2. Seasonality: trend occurring at known time intervals
  3. Cycle: longer-term upward or downward changes that may follow a business or economic cycle
  4. Autocorrelation: correlation between past and future values of a time series
  5. Random variance: fluctuations that are free of any visible, repeating, and detectable patterns over time

We can quickly gain an insight into the time series composition by calling the decompose() function and plotting the results, as shown below.

The decomposition plot shows that our time series has a weekly seasonality and an upward trend in weeks 1 through 4, followed by a downward trend, starting at week 5. With this information, we can build a couple of simple forecasting models commonly used in the real world.


Forecasting with Simple Models

In this section, I will cover the naive and exponential smoothing methods. Later on, we will cover more complex models such as ARIMA and seasonal ARIMA.

Note that there are many other forecasting methods available, in addition to what I will cover in this tutorial. I strongly recommend that you familiarize yourself with those methods to understand what may be the best fit for your data. You can start with the same Hyndman and Athanasopoulos’ textbook I mentioned above.

Before modeling, we will create a training set using 46 observations, leaving the rest 15 for testing.

#Create a training set
train <- subset(CasesConfirmedTs, end=length(CasesConfirmedTs)-15)

Naive Method

The naive method is the simplest forecasting method that uses the most recent observation. We can fit the naive model and produce a forecast by using naive(). For seasonal data, we need to use snaive(), and rwf() for the data that has a trend.

Now, let’s see how naive models work.

#Fit and forecast seasonal naive model
naive.fit <- snaive(train, h=15)
#Fit and forecast random walk model with a trend
drif.fit <- rwf(train, h=15, drift=TRUE)

Note that the same function that is used to fit the model also produces the forecast and prediction intervals. Now, let's take a look at how accurate the naive forecasts are by visualizing them against the actuals.

#Plot Forecast vs. Actual for both models
autoplot(CasesConfirmedTs, main = "Naive Forecasts of COVID-19 Cases in NYC", ylab = "Confirmed Cases", xlab = "Week", color="#3CAEA3", size = 0.8) +
  autolayer(naive.fit, series="Naive with Seasonality", PI=FALSE, size = 0.8)+
  autolayer(drif.fit, series = "Naive with Trend", PI=FALSE, size = 0.8)+
  scale_color_manual(values = c("Naive with Seasonality" = "#ED553B", "Naive with Trend"="#20639B"))+
  guides(color=guide_legend(title="Daily forecasts"))+
  theme(plot.title = element_text(hjust = 0.5))

Compare the actual observations with the forecasts produced by the naive models. These models show a pretty high forecast error and clearly do not provide a good fit for our data, which was expected.

Exponential Smoothing with Trend and Seasonality

The next type of forecasting model we will cover is called exponential smoothing. Exponential smoothing models use weighted averages of past observations with higher weights for more recent observations and lower weights for older observations. Exponential smoothing models are very commonly used because they are easy to deal with and can accommodate complex data containing trends and seasonality.

These models also let us specify the nature of seasonal variations. If a time series fluctuates consistently over time, then the additive method should be selected. The multiplicative method should be used if the fluctuations increase or decrease over time.

Our data does show several larger spikes around weeks 3, 4, and 5, but the pattern does not seem consistent. Just to make sure we are not missing anything, we will fit both additive and multiplicative models, using the Holt-Winters method for data with trend and seasonality.

#Fit and forecast the Exponential Smoothing model using Holt-Winters multiplicative method
es.fit.m <- hw(train, seasonal="multiplicative", h=15)
#Fit and forecast the Exponential Smoothing model using the Holt-Winters additive method
es.fit.a <- hw(train, seasonal="additive", h=15)

Now, let's plot the forecasts against the actuals to see how well the models did.

#Plot Forecast vs. Actual for both exponential smoothing models
autoplot(CasesConfirmedTs, main = "Exponential Smoothing Forecasts of COVID-19 Cases in NYC", ylab = "Confirmed Cases", xlab = "Week", color="#3CAEA3", size = 0.8) +
  autolayer(es.fit.a, series="Holt-Winters Additive", PI=FALSE, size = 0.8)+
  autolayer(es.fit.m, series="Holt-Winters Multiplicative", PI=FALSE, size = 0.8)+
  scale_color_manual(values = c("Holt-Winters Additive" = "#ED553B", "Holt-Winters Multiplicative"="#20639B"))+
    guides(color=guide_legend(title="Daily forecasts"))+
  theme(plot.title = element_text(hjust = 0.5))

Both of these models produced much more accurate forecasts compared to the naive models. It also appears that the additive method is slightly better than the multiplicative method.


Model Performance Evaluation

Sometimes it’s hard to tell which forecast is better by just looking at the plot. To ensure that we correctly identify and select the best performing model, we need to calculate the forecast error of each model and compare them.

The two most commonly used forecasting model performance measures are mean absolute error (MAE), and root mean squared error (RMSE). MAE is calculated by averaging all absolute forecast errors. RMSE is calculated by taking the square root of the average of squared forecast errors. The model with smaller MAE or RMSE in a testing set has a better performance.

I will go ahead and calculate MAE and RMSE for both of our exponential smoothing models and see which one is better. Luckily, we don’t have to calculate each of these measures separately because the accuracy() function calculates all the forecasting error measures for us.

#Accuracy measures for Holt-Winters additive model
accuracy(es.fit.a, CasesConfirmedTs)

ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set -107.1783 606.1282 495.0830 79.66516 253.2089 0.3878714 0.0746227 NA
Test set -725.2197 890.2929 725.2197 -2369.21052 2369.2105 0.5681713 0.5078152 2.113676

#Accuracy measures for Holt-Winters multiplicative model
accuracy(es.fit.m, CasesConfirmedTs)

ME RMSE MAE MPE MAPE MASE ACF1 Theil’s U
Training set -66.79757 443.7662 362.7547 9.648421 80.04052 0.2841991 -0.0101242 NA
Test set -981.61246 1107.5152 981.6125 -2631.750181 2631.75018 0.7690415 0.5660774 2.499631

As expected, the additive model shows smaller MAE and RMSE in the testing set; therefore, it’s a better fit.


Forecasting with ARIMA Models

Our next big topic is ARIMA models. There is a lot of ground to cover in this section because ARIMA models require a deeper understanding of the time series components.

Let’s begin.

ARIMA or Autoregressive Integrated Moving Average Model is another widely used forecasting approach, which consists of the following components or terms:

  1. Autoregressive (AR) term uses correlations between observations of the same variable at different times or lags to make predictions. These correlations are usually referred to as autocorrelations.

  2. Integrated (I) term refers to differencing, which is a way to stabilize the mean of a time series.

  3. Moving Average (MA) term uses past forecast errors to predict future values.

To fit an ARIMA model, we need to analyze each of the model’s terms and select the appropriate order. ARIMA models follow a specific order that can be written as ARIMA(p, d, q) where:

  1. p is the order of the AR term

  2. d is the differencing order

  3. q is the order of the MA term

Later on, I will describe each of these terms in detail and show you how to determine the right order, but first, we need to cover a few important concepts.


Stationary vs. Non-stationary Time Series

The first concept we will go over is stationarity in a time series. A time series is considered stationary if its mean, variance, and autocorrelation structure do not change over time.

Why is this concept so important? Because it helps determine the right order of ARIMA models. To do that, we need to identify every component that makes the time series change its behavior over time and strip it away until the series becomes stationary.


Stabilizing Variance

A stationary time series maintains constant variance over time. If your time series fluctuations increase or decrease with time, then it’s heteroscedastic. In the context of a time series, heteroscedasticity refers to non-constant variance exhibited by the time series.

The Box-Cox transformation is a common method used to transform a time series exhibiting non-constant variance. It uses the lambda parameter to inform the level of the transformation. The lambda parameter is nothing but an exponent, varying from -5 to 5. The optimal lambda is the one that results in the best approximation of a normal distribution curve.

Below are the most common transformations:

Lambda_Value Transformed_Data
-3 Y^-3
-2 Y^-2
-1 Y^-1 or inverse
-0.5 Y^-0.5
0 log(Y)
0.5 Y^0.5 or square root
1 Y or no transformation
2 Y^2
3 Y^3

I will go ahead and calculate the optimal lambda for our time series.

lambda <- BoxCox.lambda(CasesConfirmedTs)
lambda
## [1] 0.8032734

The optimal lambda for our data is 0.8. Note that it’s very close to 1, which means no transformation, so our time series is more or less normally distributed. However, we will still proceed with the transformation.

#Box-Cox transformation
CasesConfirmedTs2 <- BoxCox(CasesConfirmedTs, lambda = lambda)

cases<-cbind("Cases Confirmed before Transformation"=CasesConfirmedTs, "Cases Confirmed after Transformation" = CasesConfirmedTs2)

cases %>% autoplot(facets=TRUE, main = "Before Transformation vs. After Transformation", xlab="Week", color="#3CAEA3", size=0.8)+
  theme(plot.title = element_text(hjust = 0.5))

Compare the time series before the transformation and after the transformation. If you look closely, you can see that the largest spike around week 5 got a little smoothed out. Again, since our lambda is close to 1, we do not expect to see drastic changes in the transformed series.


Non-Seasonal Differencing

If the Box-Cox transformation helps stabilize the variance, non-seasonal differencing helps stabilize the mean of a time series by removing any non-seasonal trends. It’s doing so by calculating increments or changes in the time series.

We can take daily stock prices as an example. Let’s say that this daily stock prices series exhibits an upward trend. If we difference it at lag=1 (1 day), we will get daily price changes, which should remove the trend. If the trend persists, we may need to difference the time series again.

In R, this can be accomplished by the diff() command. This function takes two arguments. The first is the lag, which is the number of periods, and the second is differences, which is the order of the difference (i.e., how many times diff() is called).

Now, let’s see how it works.

#Time series after differencing
CasesConfirmedTs3 <- diff(CasesConfirmedTs2, lag= 1)
CasesConfirmedTs3
## Time Series:
## Start = c(1, 2) 
## End = c(9, 5) 
## Frequency = 7 
##  [1]   17.669173    5.747368   33.021905   68.407683   78.092875    6.753247
##  [7]  102.977139  256.997392   72.321347  106.346540  147.839604   60.565005
## [13] -281.983708  -11.734295  198.587833  173.646506   65.573668   35.813558
## [19]   12.563744 -311.568160   10.293114  489.454607 -154.774834  -26.854787
## [25]  114.423056  -23.265926 -336.052570  -27.036767  489.011048  -51.901077
## [31]  -94.626140  -99.627622 -120.331643 -132.714643 -174.077226  102.526173
## [37]  168.335022  -50.061040  -69.202627    9.450651 -298.933371   45.178347
## [43]  296.476228 -143.590148   84.565879 -130.217412  -82.458754 -199.194460
## [49] -127.707673  291.622140   89.459002  -79.610034  -73.850390  -54.998429
## [55] -171.080690  -74.682144  143.819083  -48.750647 -127.675879 -208.088127

This is how our observations look after we differenced the data once. Now, let's plot the differenced series to see if we were able to remove the trend.

autoplot(CasesConfirmedTs3, main = "Daily Changes in COVID-19 Cases in NYC", xlab = "Week", ylab = "Changes in Cases Confirmed", size=0.8, color = "#3CAEA3")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

According to this plot, it appears there is still some trend remaining in our data. This means that we need to difference the series again.

#second differencing
CasesConfirmedTs3 <- diff(CasesConfirmedTs3)

#plot the results
autoplot(CasesConfirmedTs3, main = "Daily Changes in COVID-19 Cases in NYC", xlab = "Week", ylab = "Changes in Cases Confirmed", size=0.8, color = "#3CAEA3")+
  theme(plot.title = element_text(hjust = 0.5))+
  geom_smooth(method = "lm")
## `geom_smooth()` using formula 'y ~ x'

This looks much better! Now, our series fluctuates around a straight horizontal line.


Seasonal Differencing

Unlike non-seasonal differencing, seasonal differencing stabilizes the mean of a time series by removing seasonal trends. It works exactly the same way as non-seasonal differencing, except that the lag argument in the diff() function needs to be set to the seasonal frequency of the time series.

In addition to the decompose() command that we covered in the beginning of this tutorial, there is another great tool to spot seasonal patterns. That is ACF and PACF. We will cover ACF and PACF in greater detail in the subsequent sections. For now, I will only show you how to use them to identify seasonality.

Below is an example of ACF and PACF plots.

#Generate ACF and PACF plots
acf2(CasesConfirmedTs3, max.lag= 54)

The trick is to look at the spikes that are repeating at the same interval. If you notice a repeating pattern, that means your time series exhibits seasonal behavior with the frequency equal to the interval at which the spikes reoccur. For the purpose of this exercise, it does not matter whether the ACF or the PACF plot exhibits seasonality. One of them or both should detect seasonality if it’s present in the data.

Looking at the ACF plot of our data, we can see the spikes reoccurring every 7 days for the 1st three weeks, which confirms that our data exhibits weekly seasonality.

To determine the seasonal differencing order (i.e., how many times, if any, we need to seasonally difference our data), we can use the nsdiffs() function from the forecast package.

CasesConfirmedTs3 %>% nsdiffs()
## [1] 1

According to nsdiffs(), our data needs first order seasonal differencing to remove the seasonal trend. We can do it by using the same diff() function, but this time setting the lag to 7.

Let's see how it works

#1st order seasonal differencing
CasesConfirmedTs4 <- diff(CasesConfirmedTs3, lag=7)

#plot the resulting series
autoplot(CasesConfirmedTs4, main = "", xlab = "Week", size = 0.8, color = "#3CAEA3")

This is how our time series looks after the second order non-seasonal and first order seasonal differencing. It certainly appears more stationary now that we removed the general trend and seasonality.


Autoregressive Models

Next, in this tutorial, I will explain the AR term of ARIMA model. AR stands for autoregressive models. These models are similar to linear regression, but instead of different independent variables, they use the same variable at different times. Let’s look at the stock prices again. Yesterday’s price (lag = 1) and the day before yesterday’s price (lag = 2) may be good indicators of the price today (lag = 0), so we would use 2 lags to predict today's price.

The simplest autoregressive model is the first order AR model or AR(1), which uses one lag to produce a forecast. To fit AR(1) model, we can use the arima() function by specifying the order = c(1,0,0).


Moving Average Models

Instead of using correlations between observations at different times, ARIMA’s Moving Average (MA) model uses past forecast errors to make predictions. The Moving Average process can be thought of as a weighted moving average of the past few forecast errors. How many forecast errors to include would depend on the order of the MA model.

The first order Moving Average model or MA(1) uses the forecast error or residual of the current observation, and that of the previous observation, multiplied by some coefficient. To fit MA(1), we can use the arima() function by specifying the order = c(0,0,1).


Determine ARIMA Model Order with ACF and PACF

ACF, or autocorrelation function, visualizes correlation scores (from -1 to 1) between observations at different times. The PACF plot does the same but using partial correlations. Partial correlations measure the correlations between observation and some number of lagged observations, having removed the variance that was already explained by the previous lags.

We can use the same stock price example to explain partial correlations. We know that today’s stock price correlates with the price yesterday and the day before yesterday. According to the same logic, it is very likely that yesterday’s price is also correlated with the day before yesterday’s price. In that case, PACF will show a spike at lag 1 and, perhaps, lag 2 in case a significant portion of the variance that remained unexplained by yesterday’s price can be explained by the price the day before yesterday.

Let’s do a practice exercise and try to determine ARIMA’s order by analyzing ACF and PACF. We can reference the table below that describes behaviors of ACF and PACF plots that are characteristic to AR, MA, and ARMA (both AR and MA terms present) models.

AR(p) MA(q) ARMA(p.q)
ACF Tails off Cuts off after lag q Tails off
PACF Cuts off after lag p Tails off Tails off

According to this table, we should rely on PACF to determine the order of the AR term, and ACF to determine the order of the MA term. ARMA models are more complicated and hard to spot from ACF/PACF plots.

Now, let’s practice.

acf2(CasesConfirmedTs, max.lag = 14)

These ACF and PACF are generated using our original time series (before any transformations). The plots suggest an AR(1) model because ACF is tailing off, and PACF cuts off after the first lag. Note that if the spikes are tailing off very slowly (like in this example), it usually means that the time series is not stationary. In that case, you need to stationarize the time series and repeat this exercise.

Seasonal ARIMA Models

ARIMA can also accommodate time series with seasonality using a seasonal ARIMA model. Seasonal ARIMA has a slightly different structure. It incorporates both non-seasonal and seasonal terms in the following order, ARIMA(p,d,q)x(P,D,Q)S where:

  1. P = seasonal AR order (SAR)

  2. D = seasonal differencing

  3. Q = seasonal MA order (SMA)

  4. S = frequency

The table below describes ACF and PACF behavior characteristic to ARIMA’s seasonal terms, SAR, SMA, and SARMA. It works the same way as above, but instead of the first few lags, we need to be looking at the spikes at seasonal lags.

SAR(P)S SMA(Q)S SARMA(P,Q)S
ACF Tails off at seasonal lags Cuts off after lag QS Tails off at seasonal lags
PACF Cuts off after lag PS Tails off at seasonal lags Tails off at seasonal lags

acf2(CasesConfirmedTs3, max.lag = 28)

In this example, we can see a spike at every weekly lag in ACF and only one spike at weekly lag 1 in PACF, which suggests SAR(1). You can also see there are 2 spikes at daily lag 1 and 2 in both ACF and PACF plots. This time, it seems that PACF is tailing off, and ACF cuts off, which suggests MA(2) term.

Sometimes these plots are tricky to interpret, and it’s totally ok. Do not spend too much time trying to over-analyze it. It’s faster to build a few models with different orders and then compare the results.


Forecasting with ARIMA Models

Now that we learned all the necessary concepts, we are ready to forecast with ARIMA. Let’s go ahead and generate ACF and PACF for our time series, showing all the transformations we had done along the way, and determine the model order.

CasesConfirmedTs%>%
  BoxCox(lambda=lambda)%>% #Box-Cox transformation to stabilize the variance
  diff(lag=7)%>% #first order seasonal differencing to remove seasonal trends
  diff(lag=1, differences = 2)%>% #non-seasonal differencing of the 2nd order to remove non-seasonal trends
  ggtsdisplay()

Non-seasonal ARIMA Terms:

To identify significant non-seasonal terms, we need to look at the lags before the 1st seasonal lag. In this case, it appears that ACF breaks after lag 1, and PACF tails off, which suggests an MA(1) term.

Seasonal ARIMA Terms:

We can see 1 seasonal spike at lag=7 in ACF and no seasonal spikes in PACF, which suggests an SMA(1) term.

Now, we need to put all the terms together to create a model. Let’s take the time and review each term and how we derived it.

Non-seasonal Terms:

  1. AR term order = 0: PACF tailed off before the first seasonal lag, and ACF cut off after the first non-seasonal lag, suggesting no AR terms

  2. Differencing order = 2: recall that we differenced our time series twice to remove the trend

  3. MA term order = 1: PACF tailed off before the first seasonal lag, and ACF cut off after the first non-seasonal lag, suggesting MA(1) term

Seasonal Terms:

  1. SAR term order = 0: PACF showed no significant seasonal lags

  2. Seasonal Differencing = 1: recall that we differenced our time series once to remove the seasonal trend

  3. SMA term order = 1: ACF showed one significant seasonal lag, suggesting SMA(1)

  4. Frequency = 7

This is how the model looks like after we put all the terms together: ARIMA(0,2,1)x(0,1,1)[7]

Now, let’s go ahead and fit the model. Remember to use the original time series after the Box-Cox transformation but before the differencing since we are already specifying the differencing order in the model.

#Create a training set from the Box-Cox transformed time series
train2 <- subset(CasesConfirmedTs2, end=length(CasesConfirmedTs2)-15)

#Fit ARIMA(0,2,1)x(0,1,0)[7]
sarima.fit1 <- sarima(train2, 0,2,1,0,1,1,7)

When we use sarima() to fit ARIMA model, it automatically returns the model fit metrics. Since ARIMA is similar to the linear regression model, we need to make sure that the residuals are normally distributed and are free from any trends. To do that, we need to analyze the Standardized Residuals plot and the Normal QQ plot. It appears that our residuals slightly diverge from normality on one end. It does not look significant to me, but we can always check using a statistical test for normality. One such test is Shapiro-Wilk normality test that compares the sample distribution to a normal one to assess if the data shows any severe deviation from normality.

shapiro.test(resid(sarima.fit1$fit))
## 
##  Shapiro-Wilk normality test
## 
## data:  resid(sarima.fit1$fit)
## W = 0.95971, p-value = 0.1117

The test returned the p-value that is larger than 0.05, which means that our residuals are not significantly different from a normal distribution.

Secondly, we need to examine ACF of residuals to make sure that there are no spikes above or below the blue line. If we find any, it would mean that there is still some useful information we are not accounting for in our model. In our case, most correlations in ACF are not significant. There is one that is on the borderline, but it’s nothing to worry about.

Finally, we need to look at the Ljung-Box plot to make sure that most of the p values are outside of the significant p area (0.05). If no p-values are below 0.05, then all significant autocorrelations had been considered. Our data shows two p-values in the significant p area, but most of them look fine.

It appears that our model passed all the tests, and we can proceed to creating a forecast. However, let’s say we did not pass the residuals test, what do we do then?

In that case, we could adjust the model by adding a new term(s) and analyze the effect on the residuals. For this demonstration, I will fit another ARIMA model and add one more MA term to the model’s order.

#Fit ARIMA(0,2,2)x(0,1,1)[7]
sarima.fit2 <- sarima(train2, 0,2,2,0,1,1,7)

This model does not show any drastic changes in the residuals plots. There are slight improvements to the QQ plot and the Ljung-Box statistic plot. Since both of these models pass the residuals test, how do we know which one is better?

We can compare ARIMA models by using AIC, AICc, or BIC. All three of them estimate prediction error. The model that shows the lowest prediction error is a better fit. AICc is commonly used with time series because it adds a penalty for model complexity.

#Summary of sarima.fit1
sarima.fit1
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1     sma1   constant
##       -0.9898  -0.7475    29.9408
## s.e.   0.1426   0.2994  9268.5960
## 
## sigma^2 estimated as 17364:  log likelihood = -241.46,  aic = 490.92
## 
## $degrees_of_freedom
## [1] 34
## 
## $ttable
##          Estimate        SE t.value p.value
## ma1       -0.9898    0.1426 -6.9403  0.0000
## sma1      -0.7475    0.2994 -2.4968  0.0175
## constant  29.9408 9268.5960  0.0032  0.9974
## 
## $AIC
## [1] 11.41663
## 
## $AICc
## [1] 11.43094
## 
## $BIC
## [1] 11.56648
#Summary of sarima.fit2
sarima.fit2
## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##           ma1     ma2     sma1   constant
##       -1.1847  0.1956  -0.6852    29.9411
## s.e.   0.3143  0.2652   0.2836  9268.6438
## 
## sigma^2 estimated as 17642:  log likelihood = -241.31,  aic = 492.61
## 
## $degrees_of_freedom
## [1] 33
## 
## $ttable
##          Estimate        SE t.value p.value
## ma1       -1.1847    0.3143 -3.7696  0.0006
## ma2        0.1956    0.2652  0.7375  0.4660
## sma1      -0.6852    0.2836 -2.4157  0.0214
## constant  29.9411 9268.6438  0.0032  0.9974
## 
## $AIC
## [1] 11.45607
## 
## $AICc
## [1] 11.48055
## 
## $BIC
## [1] 11.64338

We can see that our first model is a better fit for the data because it shows smaller AIC, AICc, and BIC. Also, the second MA term added to the second model does not seem to be significant (p-value > 0.05).

Let’s still create forecasts for both of the models and compare forecast errors as well.

#Forecast ARIMA(0,2,1)x(0,1,0)[7]
sarima.forecast1 <- sarima.for(train2, n.ahead=15, 0,2,1,0,1,1,7)

#Forecast ARIMA(0,2,2)x(0,1,0)[7]
sarima.forecast2 <- sarima.for(train2, n.ahead=15, 0,2,2,0,1,1,7)

Both forecasts look pretty good. Let’s put them against the actuals.

autoplot(CasesConfirmedTs2, main = "Seasonal ARIMA Forecasts", ylab = "Box-Cox Transformed Confirmed Cases", xlab = "Week", size = 0.8, color = "#3CAEA3") +
   autolayer(sarima.forecast1$pred, series = "ARIMA(0,2,1)x(0,1,1)[7]", size = 0.8)+
   autolayer(sarima.forecast2$pred, series = "ARIMA(0,2,2)x(0,1,1)[7]", size = 0.8)+
  scale_color_manual(values = c("ARIMA(0,2,1)x(0,1,1)[7]" = "#ED553B", "ARIMA(0,2,2)x(0,1,1)[7]"="#20639B"))+
  guides(color=guide_legend(title="Daily forecasts"))+
  theme(plot.title = element_text(hjust = 0.5))

Although both forecasts are pretty close, I think it’s clear that our first model, ARIMA(0,2,1)x(0,1,1)[7], produced a more accurate forecast.

However, let’s still calculate MAE and RMSE to see the difference in error measures between the two models.

#ARIMA(0,2,1)x(0,1,1)[7]
accuracy(sarima.forecast1$pred, CasesConfirmedTs)

ME RMSE MAE MPE MAPE ACF1 Theil’s U
Test set 1098.371 1251.613 1119.708 -110.9809 244.6294 0.5088685 1.424362

#ARIMA(0,2,2)x(0,1,1)[7]
accuracy(sarima.forecast2$pred, CasesConfirmedTs)

ME RMSE MAE MPE MAPE ACF1 Theil’s U
Test set 1174.491 1309.104 1178.084 42.58439 102.4735 0.4977672 1.515906

Compare MAE and RMSE of the two models. Our first ARIMA model shows a significantly lower forecast error.

Auto.arima

There is another way for us to find the optimal model order for ARIMA, one that does not require such exhaustive analysis. Supposedly, by just calling auto.arima(), we should get the best ARIMA order for our data. It may be a good starting point, but I would not recommend relying on the algorithm entirely.

As an example, let’s see what auto.arima thinks the best model order would be for our data.

#auto.arima
auto.arima <- auto.arima(train2)
auto.arima 
## Series: train2 
## ARIMA(0,1,0)(0,1,1)[7] 
## 
## Coefficients:
##          sma1
##       -0.8271
## s.e.   0.4524
## 
## sigma^2 estimated as 17087:  log likelihood=-242.3
## AIC=488.59   AICc=488.94   BIC=491.87

Auto.arima says that the best order is ARIMA(0,1,0)(0,1,1)[7]. Let’s check the residuals and then compare how this model performs against its sister.

#check residuals
checkresiduals(auto.arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,0)(0,1,1)[7]
## Q* = 9.1863, df = 8, p-value = 0.3268
## 
## Model df: 1.   Total lags used: 9

The model passed all the residual tests: the residuals are random, more or less normally distributed, and without any significant autocorrelations.

Now, let’s compare the models’ performance.

#Generate a forecast
sarima.forecast3 <- sarima.for(train2, n.ahead=15, 0,1,0,0,1,1,7)

Auto.arima’s forecast does not look anywhere close! This is why we learned it the hard way so that we can produce a better forecast than the algorithm.


Convert a Box-Cox Transformed Series back to Original Scale

Before we call it a day, remember that we still have that Holt-Winters model we built at the beginning of this tutorial that also showed a pretty good forecast? We should go back and compare it against our best ARIMA model.

Recall that at this point, our ARIMA model is based on the Box-Cox transformed time series, so it’s at a different scale compared to the Holt-Winters model. To be able to compare them, we need to have them on the same scale. Thus, we need to convert the ARIMA model predictions back to the original scale of our time series and then compare the two models.

#Converting ARIMA predictions back to the original scale
sarima.forecast1.trfmd <- (sarima.forecast1$pred*lambda+1)^(1/lambda) #use the same lambda used in the Box-Cox transformation
sarima.forecast1.trfmd
## Time Series:
## Start = c(7, 5) 
## End = c(9, 5) 
## Frequency = 7 
##  [1] 3260.6626 3017.4864 1737.9671 1489.7900 2523.9070 2289.4202 2147.7590
##  [8] 1942.2289 1663.9686  521.7019  288.4301 1064.4879  816.6991  652.2973
## [15]  446.1791

Ok, now we are ready to compare the two models, so let’s plot the forecasts against the actuals.

autoplot(CasesConfirmedTs, main = "Holt-Winters vs. ARIMA Forecast", ylab = "Confirmed Cases", xlab = "Week", size=0.8, color="#3CAEA3") +
    autolayer(es.fit.a, series="Holt-Winters with Trend and Seasonality", PI=FALSE, size=0.8)+
    autolayer(sarima.forecast1.trfmd, series = "ARIMA(0,2,1,0,1,1)[7]", size=0.8)+
   scale_color_manual(values = c("Holt-Winters with Trend and Seasonality" = "#ED553B", "ARIMA(0,2,1,0,1,1)[7]"="#20639B"))+
  guides(color=guide_legend(title="Daily forecasts"))+
  theme(plot.title = element_text(hjust = 0.5))

We can quickly tell by looking at the plot that our ARIMA model is a much better fit for the data. Still, let us check the model accuracy measures.

accuracy(sarima.forecast1.trfmd, CasesConfirmedTs)

ME RMSE MAE MPE MAPE ACF1 Theil’s U
Test set -42.73234 352.6578 301.7063 -486.147 509.0567 0.5576645 0.5520743

Nice work! We have a model that can predict net new cases of coronavirus in NYC with an average error of plus-minus 300 cases. With the second wave upon us (at least that's what everyone says), this model may still be useful in forecasting the next COVID-19 outbreak.

In conclusion, I hope you found this tutorial helpful and will be able to use the techniques you learned here in your next forecasting project.

Good luck, and stay safe!


Latest update May 25, 2020

Keep on reading...