Progressive Hedging

This tutorial was generated using Literate.jl. Download the source as a .jl file.

The purpose of this tutorial is to demonstrate the Progressive Hedging algorithm. It may be helpful to read Two-stage stochastic programs first.

Required packages

This tutorial requires the following packages:

using JuMP
import Distributions
import Ipopt
import Printf

Background

Progressive Hedging (PH) is a popular decomposition algorithm for stochastic programming. It decomposes a stochastic problem into scenario subproblems that are solved iteratively, with penalty terms driving solutions toward consensus.

In Progressive Hedging, each scenario subproblem includes a quadratic penalty term:

\[\min\limits_{x_s}: f_s(x_s) + \frac{\rho}{2} ||x_s - \bar{x}||^2 + w_s^\top x_s\]

where:

  • $x_s$ is the primal variable in scenario $s$
  • $f_s(x)$ is the original scenario objective
  • $\rho$ is the penalty parameter
  • $\bar{x}$ is the current consensus (average) solution
  • $w_s$ is the dual price (Lagrangian multiplier) in scenario $s$

Progressive Hedging is an iterative algorithm. In each iteration, it solves all the penalized scenario subproblems, then it applies two updates:

  1. $\bar{x} = \mathbb{E}_s[x_s]$
  2. $w_s = w_s + \rho (x_s - \bar{x})$

The algorithm terminates if $|\bar{x} - x_s| \le \varepsilon$ for all scenarios (the primal residual), and $\bar{x}$ has not changed by much between iterations (the dual residual).

$\rho$ can be optionally updated between iterations. How to do so is an open question. There is a large literature on different updates strategies.

In this tutorial we use parameters for $\rho$, $w$, and $\bar{x}$ to efficiently modify each scenario's subproblem between PH iterations.

Building a single scenario

The building block of Progressive Hedging is a separate JuMP model for each scenario. Here's an example, using the problem from Two-stage stochastic programs:

function build_subproblem(; demand::Float64)
    model = Model(Ipopt.Optimizer)
    set_silent(model)
    @variable(model, x >= 0)
    @variable(model, 0 <= y <= demand)
    @constraint(model, y <= x)
    @variable(model, ρ in Parameter(1))
    @variable(model, x̄ in Parameter(0))
    @variable(model, w in Parameter(0))
    @expression(model, f_s, 2 * x - 5 * y + 0.1 * (x - y))
    @objective(model, Min, f_s + ρ / 2 * (x - x̄)^2 + w * x)
    return model
end
build_subproblem (generic function with 1 method)

Using the build_subproblem function, we can create one JuMP model for each scenario:

N = 10
demands = rand(Distributions.TriangularDist(150.0, 250.0, 200.0), N);
subproblems = map(demands) do demand
    return (; model = build_subproblem(; demand), probability = 1 / N)
end;

The Progressive Hedging loop

We're almost ready for our optimization loop, but first, here's a helpful function for logging:

function print_iteration(iter, args...)
    if mod(iter, 10) == 0
        f(x) = Printf.@sprintf("%15.4e", x)
        println(lpad(iter, 9), " ", join(f.(args), " "))
    end
    return
end
print_iteration (generic function with 1 method)

Now we can implement our algorithm:

function solve_progressive_hedging(
    subproblems;
    iteration_limit::Int = 400,
    atol::Float64 = 1e-4,
    ρ::Float64 = 1.0,
)
    x̄_old, x̄ = 0.0, 0.0
    x, w = zeros(length(subproblems)), zeros(length(subproblems))
    println("iteration primal_residual   dual_residual")
    # For each iteration...
    for iter in 1:iteration_limit
        # For each subproblem...
        for (i, data) in enumerate(subproblems)
            # Update the parameters
            set_parameter_value(data.model[:ρ], ρ)
            set_parameter_value(data.model[:x̄], x̄)
            set_parameter_value(data.model[:w], w[i])
            # Solve the subproblem
            optimize!(data.model)
            assert_is_solved_and_feasible(data.model)
            # Store the primal solution
            x[i] = value(data.model[:x])
        end
        # Compute the consensus solution for the first-stage variables
        x̄ = sum(s.probability * x_s for (s, x_s) in zip(subproblems, x))
        # Compute the primal and dual residuals
        primal_residual = maximum(abs, x_s - x̄ for x_s in x)
        dual_residual = ρ * abs(x̄ - x̄_old)
        print_iteration(iter, primal_residual, dual_residual)
        # Check for convergence
        if primal_residual < atol && dual_residual < atol
            break
        end
        # Update
        x̄_old = x̄
        w .+= ρ .* (x .- x̄)
    end
    return x̄
end

x̄ = solve_progressive_hedging(subproblems);
iteration primal_residual   dual_residual
       10      6.0396e-14      3.0000e+00
       20      1.1369e-13      3.0000e+00
       30      2.1316e-13      3.0000e+00
       40      8.3844e-13      3.0000e+00
       50      3.4959e-12      3.0000e+00
       60      5.6701e-11      3.0000e+00
       70      1.3299e+00      1.9475e+00
       80      1.8384e+00      4.5000e-01
       90      3.2751e-01      6.5166e-02
      100      9.1352e-02      4.5079e-02
      110      6.9887e-02      9.2377e-03
      120      4.8401e-02      4.3812e-03
      130      4.4359e-02      1.2097e-03
      140      9.1980e-03      9.2719e-03
      150      6.6613e-03      5.3219e-03
      160      4.6371e-03      3.0341e-03
      170      3.1379e-03      1.7172e-03
      180      2.0783e-03      9.6423e-04
      190      1.3532e-03      5.3670e-04
      200      8.6885e-04      2.9580e-04
      210      5.5131e-04      1.6119e-04
      220      3.4626e-04      8.6680e-05

The consensus first-stage decision is:

217.74810386692195

Progressive Hedging with an adaptive penalty parameter

You can also make the penalty parameter $\rho$ adaptive. How to do so is an open question. There is a large literature on different updates strategies. One approach is to increase $\rho$ if the primal residual is much larger than the dual residual, and to decrease $\rho$ if the dual residual is much larger than the primal residual.

function solve_adaptive_progressive_hedging(
    subproblems;
    iteration_limit::Int = 400,
    atol::Float64 = 1e-4,
    ρ::Float64 = 1.0,
    τ::Float64 = 1.3,
    μ::Float64 = 15.0,
)
    x̄_old, x̄ = 0.0, 0.0
    x, w = zeros(length(subproblems)), zeros(length(subproblems))
    println("iteration primal_residual   dual_residual")
    for iter in 1:iteration_limit
        for (i, data) in enumerate(subproblems)
            set_parameter_value(data.model[:ρ], ρ)
            set_parameter_value(data.model[:x̄], x̄)
            set_parameter_value(data.model[:w], w[i])
            optimize!(data.model)
            assert_is_solved_and_feasible(data.model)
            x[i] = value(data.model[:x])
        end
        x̄ = sum(s.probability * x_s for (s, x_s) in zip(subproblems, x))
        primal_residual = maximum(abs, x_s - x̄ for x_s in x)
        dual_residual = ρ * abs(x̄ - x̄_old)
        print_iteration(iter, primal_residual, dual_residual)
        if primal_residual < atol && dual_residual < atol
            break
        end
        w .+= ρ .* (x .- x̄)
        x̄_old = x̄
        # Adaptive ρ update
        if primal_residual > μ * dual_residual
            ρ *= τ
        elseif dual_residual > μ * primal_residual
            ρ /= τ
        end
    end
    return x̄
end

x̄ = solve_adaptive_progressive_hedging(subproblems);
iteration primal_residual   dual_residual
       10      5.1202e-11      3.0000e+00
       20      5.8061e-01      7.8260e-02
       30      2.2247e-01      5.0624e-02
       40      9.3982e-02      2.4035e-02
       50      6.3957e-02      9.4033e-03
       60      4.5687e-02      8.1923e-03
       70      1.0570e-03      3.6890e-02
       80      1.0565e-03      2.8298e-02
       90      1.9532e-03      1.6617e-02
      100      1.9316e-03      9.7003e-03
      110      2.1736e-03      5.6042e-03
      120      2.0751e-03      4.2051e-03
      130      1.4299e-03      3.1477e-03
      140      9.9005e-04      1.7924e-03
      150      6.6724e-04      1.0131e-03
      160      5.8005e-04      7.3284e-04
      170      4.7402e-04      5.3957e-04
      180      3.0447e-04      2.9748e-04
      190      1.9326e-04      1.6217e-04
      200      1.5894e-04      8.6144e-05
      210      1.2752e-04      6.0165e-05

The consensus first-stage decision is:

217.74799866650977

Try tuning the values of τ and μ. Can you get the algorithm to converge in fewer iterations?