Time Series Analysis

A time series is a sequence of data points, each associated with a time. In our example, we will work with a time series of daily temperatures in the city of Melbourne, Australia over a period of a few years. Let $x$ be the vector of the time series, and $x_i$ denote the temperature in Melbourne on day $i$. Here is a picture of the time series:

using Plots, Convex, ECOS, DelimitedFiles
const MOI = Convex.MOI

aux(str) = joinpath(@__DIR__, "aux_files", str) # path to auxiliary files

temps = readdlm(aux("melbourne_temps.txt"), ',')
n = size(temps, 1)
plot(
    1:n,
    temps[1:n],
    ylabel = "Temperature (°C)",
    label = "data",
    xlabel = "Time (days)",
    xticks = 0:365:n,
)
Example block output

We can quickly compute the mean of the time series to be $11.2$. If we were to always guess the mean as the temperature of Melbourne on a given day, the RMS error of our guesswork would be $4.1$. We'll try to lower this RMS error by coming up with better ways to model the temperature than guessing the mean.

A simple way to model this time series would be to find a smooth curve that approximates the yearly ups and downs. We can represent this model as a vector $s$ where $s_i$ denotes the temperature on the $i$-th day. To force this trend to repeat yearly, we simply want

\[s_i = s_{i + 365}\]

for each applicable $i$.

We also want our model to have two more properties:

  • The first is that the temperature on each day in our model should be relatively close to the actual temperature of that day.
  • The second is that our model needs to be smooth, so the change in temperature from day to day should be relatively small. The following objective would capture both properties:

\[\sum_{i = 1}^n (s_i - x_i)^2 + \lambda \sum_{i = 2}^n(s_i - s_{i - 1})^2\]

where $\lambda$ is the smoothing parameter. The larger $\lambda$ is, the smoother our model will be.

The following code uses Convex to find and plot the model:

yearly = Variable(n)
eq_constraints = [yearly[i] == yearly[i-365] for i in 365+1:n]

smoothing = 100
smooth_objective = sumsquares(yearly[1:n-1] - yearly[2:n])
problem = minimize(
    sumsquares(temps - yearly) + smoothing * smooth_objective,
    eq_constraints,
);
solve!(
    problem,
    MOI.OptimizerWithAttributes(ECOS.Optimizer, "maxit" => 200, "verbose" => 0),
)
residuals = temps - evaluate(yearly)

# Plot smooth fit
plot(1:n, temps[1:n], label = "data")
plot!(
    1:n,
    evaluate(yearly)[1:n],
    linewidth = 2,
    label = "smooth fit",
    ylabel = "Temperature (°C)",
    xticks = 0:365:n,
    xlabel = "Time (days)",
)
Example block output

We can also plot the residual temperatures, $r$, defined as $r = x - s$.

# Plot residuals for a few days
plot(1:100, residuals[1:100], ylabel = "Residuals", xlabel = "Time (days)")
Example block output
root_mean_square_error = sqrt(sum(x -> x^2, residuals) / length(residuals))
2.7488719721107198

Our smooth model has a RMS error of $2.7$, a significant improvement from just guessing the mean, but we can do better.

We now make the hypothesis that the residual temperature on a given day is some linear combination of the previous $5$ days. Such a model is called autoregressive. We are essentially trying to fit the residuals as a function of other parts of the data itself. We want to find a vector of coefficients $a$ such that

\[\text{r}(i) \approx \sum_{j = 1}^5 a_j \text{r}(i - j)\]

This can be done by simply minimizing the following sum of squares objective

\[\sum_{i = 6}^n \left(\text{r}(i) - \sum_{j = 1}^5 a_j \text{r}(i - j)\right)^2\]

The following Convex code solves this problem and plots our autoregressive model against the actual residual temperatures:

# Generate the residuals matrix
ar_len = 5

residuals_mat = Matrix{Float64}(undef, length(residuals) - ar_len, ar_len)
for i in 1:ar_len
    residuals_mat[:, i] = residuals[ar_len-i+1:n-i]
end

# Solve autoregressive problem
ar_coef = Variable(ar_len)
problem =
    minimize(sumsquares(residuals_mat * ar_coef - residuals[ar_len+1:end]))
solve!(
    problem,
    MOI.OptimizerWithAttributes(ECOS.Optimizer, "maxit" => 200, "verbose" => 0),
)
Problem statistics
  problem is DCP         : true
  number of variables    : 1 (5 scalar elements)
  number of constraints  : 0 (0 scalar elements)
  number of coefficients : 21_859
  number of atoms        : 3

Solution summary
  termination status : NUMERICAL_ERROR
  primal status      : OTHER_RESULT_STATUS
  dual status        : OTHER_RESULT_STATUS

Expression graph
  minimize
   └─ qol (convex; positive)
      ├─ + (affine; real)
      │  ├─ * (affine; real)
      │  │  ├─ …
      │  │  └─ …
      │  └─ 3643×1 Matrix{Float64}
      └─ [1;;]

Plot autoregressive fit of daily fluctuations for a few days:

ar_range = 1:145
day_range = ar_range .+ ar_len
plot(
    day_range,
    residuals[day_range],
    label = "fluctuations from smooth fit",
    ylabel = "Temperature difference (°C)",
)
plot!(
    day_range,
    residuals_mat[ar_range, :] * evaluate(ar_coef),
    label = "autoregressive estimate",
    xlabel = "Time (days)",
)
Example block output

Now, we can add our autoregressive model for the residual temperatures to our smooth model to get an better fitting model for the daily temperatures in the city of Melbourne:

total_estimate = evaluate(yearly)
total_estimate[ar_len+1:end] += residuals_mat * evaluate(ar_coef)
3643-element Vector{Float64}:
 15.663902485763881
 15.344761935602632
 15.354001303670024
 16.134089753263478
 18.438023944823033
 17.019598399706254
 15.289967968678694
 14.188537906222022
 16.39305927572535
 18.46930145484137
  ⋮
 13.84145223953454
 14.098373964752284
 11.868769604896553
 13.873343052384929
 14.409261498013906
 13.955306048471076
 13.796302824692958
 13.904183045303483
 15.167518253002223

We can plot the final fit of data across the whole time range:

plot(1:n, temps, label = "data", ylabel = "Temperature (°C)")
plot!(
    1:n,
    total_estimate,
    label = "estimate",
    xticks = 0:365:n,
    xlabel = "Time (days)",
)
Example block output

The RMS error of this final model is $\sim 2.3$:

root_mean_square_error =
    sqrt(sum(x -> x^2, total_estimate - temps) / length(temps))
2.3587029392911454

This page was generated using Literate.jl.