Progressive Hedging

This tutorial was generated using Literate.jl. Download the source as a .jl file.

The purpose of this tutorial is to demonstrate the Progressive Hedging algorithm. It may be helpful to read Two-stage stochastic programs first.

Required packages

This tutorial requires the following packages:

using JuMP
import Distributions
import Ipopt
import Printf

Background

Progressive Hedging (PH) is a popular decomposition algorithm for stochastic programming. It decomposes a stochastic problem into scenario subproblems that are solved iteratively, with penalty terms driving solutions toward consensus.

In Progressive Hedging, each scenario subproblem includes a quadratic penalty term:

\[\min\limits_{x_s}: f_s(x_s) + \frac{\rho}{2} ||x_s - \bar{x}||^2 + w_s^\top x_s\]

where:

  • $x_s$ is the primal variable in scenario $s$
  • $f_s(x)$ is the original scenario objective
  • $\rho$ is the penalty parameter
  • $\bar{x}$ is the current consensus (average) solution
  • $w_s$ is the dual price (Lagrangian multiplier) in scenario $s$

Progressive Hedging is an iterative algorithm. In each iteration, it solves all the penalized scenario subproblems, then it applies two updates:

  1. $\bar{x} = \mathbb{E}_s[x_s]$
  2. $w_s = w_s + \rho (x_s - \bar{x})$

The algorithm terminates if $|\bar{x} - x_s| \le \varepsilon$ for all scenarios (the primal residual), and $\bar{x}$ has not changed by much between iterations (the dual residual).

$\rho$ can be optionally updated between iterations. How to do so is an open question. There is a large literature on different updates strategies.

In this tutorial we use parameters for $\rho$, $w$, and $\bar{x}$ to efficiently modify each scenario's subproblem between PH iterations.

Building a single scenario

The building block of Progressive Hedging is a separate JuMP model for each scenario. Here's an example, using the problem from Two-stage stochastic programs:

function build_subproblem(; demand::Float64)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, x >= 0)
    @variable(model, 0 <= y <= demand)
    @constraint(model, y <= x)
    @variable(model, ρ in Parameter(1))
    @variable(model, x̄ in Parameter(0))
    @variable(model, w in Parameter(0))
    @expression(model, f_s, 2 * x - 5 * y + 0.1 * (x - y))
    @objective(model, Min, f_s + ρ / 2 * (x - x̄)^2 + w * x)
    return model
end
build_subproblem (generic function with 1 method)

Using the build_subproblem function, we can create one JuMP model for each scenario:

N = 10
demands = rand(Distributions.TriangularDist(150.0, 250.0, 200.0), N);
subproblems = map(demands) do demand
    return (; model = build_subproblem(; demand), probability = 1 / N)
end;

The Progressive Hedging loop

We're almost ready for our optimization loop, but first, here's a helpful function for logging:

function print_iteration(iter, args...)
    if mod(iter, 10) == 0
        f(x) = Printf.@sprintf("%15.4e", x)
        println(lpad(iter, 9), " ", join(f.(args), " "))
    end
    return
end
print_iteration (generic function with 1 method)

Now we can implement our algorithm:

function solve_progressive_hedging(
    subproblems;
    iteration_limit::Int = 400,
    atol::Float64 = 1e-4,
    ρ::Float64 = 1.0,
)
    x̄_old, x̄ = 0.0, 0.0
    x, w = zeros(length(subproblems)), zeros(length(subproblems))
    println("iteration primal_residual   dual_residual")
    # For each iteration...
    for iter in 1:iteration_limit
        # For each subproblem...
        for (i, data) in enumerate(subproblems)
            # Update the parameters
            set_parameter_value(data.model[:ρ], ρ)
            set_parameter_value(data.model[:x̄], x̄)
            set_parameter_value(data.model[:w], w[i])
            # Solve the subproblem
            optimize!(data.model)
            assert_is_solved_and_feasible(data.model)
            # Store the primal solution
            x[i] = value(data.model[:x])
        end
        # Compute the consensus solution for the first-stage variables
        x̄ = sum(s.probability * x_s for (s, x_s) in zip(subproblems, x))
        # Compute the primal and dual residuals
        primal_residual = maximum(abs, x_s - x̄ for x_s in x)
        dual_residual = ρ * abs(x̄ - x̄_old)
        print_iteration(iter, primal_residual, dual_residual)
        # Check for convergence
        if primal_residual < atol && dual_residual < atol
            break
        end
        # Update
        x̄_old = x̄
        w .+= ρ .* (x .- x̄)
    end
    return x̄
end

x̄ = solve_progressive_hedging(subproblems);
iteration primal_residual   dual_residual
       10      8.1712e-14      3.0000e+00
       20      1.7053e-13      3.0000e+00
       30      3.6948e-13      3.0000e+00
       40      1.6342e-12      3.0000e+00
       50      1.2193e-11      3.0000e+00
       60      2.4386e+00      1.9800e+00
       70      2.7180e-01      9.6000e-01
       80      3.8742e-01      4.0695e-01
       90      5.4834e-10      6.0000e-02
      100      2.0456e-08      6.0000e-02
      110      3.7655e-02      3.5354e-02
      120      1.7421e-02      2.1378e-02
      130      7.3852e-03      1.2847e-02
      140      2.6223e-03      7.6746e-03
      150      5.1293e-04      4.5579e-03
      160      3.1033e-04      2.6913e-03
      170      5.4429e-04      1.5800e-03
      180      5.3274e-04      9.2214e-04
      190      4.3756e-04      5.3500e-04
      200      3.2951e-04      3.0847e-04
      210      2.3545e-04      1.7670e-04
      220      1.6237e-04      1.0053e-04

The consensus first-stage decision is:

199.0645508643155

Progressive Hedging with an adaptive penalty parameter

You can also make the penalty parameter $\rho$ adaptive. How to do so is an open question. There is a large literature on different updates strategies. One approach is to increase $\rho$ if the primal residual is much larger than the dual residual, and to decrease $\rho$ if the dual residual is much larger than the primal residual.

function solve_adaptive_progressive_hedging(
    subproblems;
    iteration_limit::Int = 400,
    atol::Float64 = 1e-4,
    ρ::Float64 = 1.0,
    τ::Float64 = 1.3,
    μ::Float64 = 15.0,
)
    x̄_old, x̄ = 0.0, 0.0
    x, w = zeros(length(subproblems)), zeros(length(subproblems))
    println("iteration primal_residual   dual_residual")
    for iter in 1:iteration_limit
        for (i, data) in enumerate(subproblems)
            set_parameter_value(data.model[:ρ], ρ)
            set_parameter_value(data.model[:x̄], x̄)
            set_parameter_value(data.model[:w], w[i])
            optimize!(data.model)
            assert_is_solved_and_feasible(data.model)
            x[i] = value(data.model[:x])
        end
        x̄ = sum(s.probability * x_s for (s, x_s) in zip(subproblems, x))
        primal_residual = maximum(abs, x_s - x̄ for x_s in x)
        dual_residual = ρ * abs(x̄ - x̄_old)
        print_iteration(iter, primal_residual, dual_residual)
        if primal_residual < atol && dual_residual < atol
            break
        end
        w .+= ρ .* (x .- x̄)
        x̄_old = x̄
        # Adaptive ρ update
        if primal_residual > μ * dual_residual
            ρ *= τ
        elseif dual_residual > μ * primal_residual
            ρ /= τ
        end
    end
    return x̄
end

x̄ = solve_adaptive_progressive_hedging(subproblems);
iteration primal_residual   dual_residual
       10      1.0348e-10      3.0000e+00
       20      6.7467e-01      6.0000e-02
       30      1.1656e+00      1.3010e-02
       40      6.5131e-01      1.6583e-02
       50      5.1318e-01      1.5611e-02
       60      3.8045e-01      1.6417e-02
       70      2.3628e-01      1.2217e-02
       80      1.4027e-01      7.1245e-03
       90      8.3327e-02      1.4727e-03
      100      6.3937e-02      1.1929e-03
      110      3.7550e-02      2.3153e-03
      120      2.1968e-02      2.5947e-03
      130      1.2693e-02      3.1577e-03
      140      9.5116e-03      3.1448e-03
      150      7.1630e-03      2.7306e-03
      160      4.0891e-03      1.9131e-03
      170      2.3176e-03      1.3008e-03
      180      1.2957e-03      1.1368e-03
      190      9.4812e-04      9.4751e-04
      200      6.9116e-04      5.9868e-04
      210      3.7879e-04      3.8196e-04
      220      2.0506e-04      2.4104e-04
      230      1.4085e-04      1.9668e-04
      240      1.0011e-04      1.5558e-04

The consensus first-stage decision is:

199.06455693884953

Try tuning the values of τ and μ. Can you get the algorithm to converge in fewer iterations?