Scenario Generation for Energy Market Modeling: Finding the Right Balance

How do you plan Europe’s energy transition when the future is uncertain? Solar and wind power are inherently stochastic — their output varies with weather patterns that can’t be perfectly predicted. My Master’s thesis tackled this challenge by analyzing different methods for generating scenarios to model this uncertainty in long-term energy planning.

Download the full thesis (PDF)

The Challenge: Planning Under Uncertainty

Europe has committed to ambitious climate goals: an 80% reduction in CO₂ emissions by 2050 compared to 1990 levels. Achieving this requires massive investment in renewable energy sources — solar, wind, and hydropower. But these energy sources are fundamentally uncertain:

Solar power varies by season and time of day
Wind power (both onshore and offshore) fluctuates with weather patterns
Hydro run-of-the-river depends on precipitation and temperature
Load (electricity demand) changes throughout the day and year

Traditional deterministic optimization models make the simplifying assumption that we know exactly what will happen. But when you’re making billion-dollar infrastructure investments that will last for decades, ignoring uncertainty can lead to poor decisions.

This is where stochastic programming comes in. Instead of optimizing for one predicted future, we optimize across many possible scenarios, each representing a different realization of uncertain variables.

The EMPIRE Model

The European Model for Power system Investment with Renewable Energy (EMPIRE) is a two-stage stochastic programming model covering 31 European countries. It optimizes:

Investment decisions (first stage): What renewable energy capacity to build and where
Operational decisions (second stage): How to dispatch power generation to meet demand in each scenario

The model accounts for:

Transmission lines between countries
Different types of generation: solar, wind onshore, wind offshore, hydro run-of-the-river
Energy storage capacity
Seasonal and hourly load variations
CO₂ emission constraints

But here’s the problem: the model needs scenarios as input. How do we generate scenarios that accurately represent the uncertainty we face?

Three Scenario Generation Methods

My thesis compared three different approaches to generating scenarios from historical data (2015-2019 for load and hydro, 1985-2015 for solar and wind):

1. Random Sampling

The simplest approach: randomly sample historical years and seasons.

Algorithm:

For each scenario, randomly select a year from the historical data
Divide the year into four seasons (winter, spring, summer, autumn)
From each season, randomly sample 168 consecutive hours
Identify the two peak load periods across the year and sample around them

This captures the natural variability in the data but makes no attempt to match statistical properties of the underlying distribution.

2. Moment-Matching

An extension of random sampling that attempts to match the first four statistical moments of the historical distribution:

Mean ( $\mu$ ): Average value
Variance ( $\sigma^2$ ): Spread of the distribution
Skewness: Asymmetry of the distribution
Kurtosis: Tail behavior (how “fat” the tails are)

Algorithm:

Generate $N$ random samples (where $N = 50$ in the thesis)
Calculate the mean, variance, skewness, and kurtosis for each sample
Calculate the aggregate moments across all samples
Select the sample that best matches the historical distribution’s moments

This approach has been used successfully in portfolio optimization (Kaut and Wallace, 2007) but hadn’t been thoroughly tested for energy market modeling.

3. Moment Load-Matching

A variant that applies moment-matching primarily to load data rather than all generators.

The motivation: load isn’t normalized compared to the electricity generators, so it could bias the other procedure. Also, this tests how moment-matching performs when aggregated on fewer time-series.

Measuring Quality: Stability Tests

How do you know if your scenario generation routine is good? The thesis used two complementary measures:

In-Sample Stability

Does the scenario generation routine give consistent results for different scenario trees?

Given a set of $K$ scenario trees $\xi_k$ , the routine is stable if for all $i, j \in K$ :

$|F(\mathbf{x}_i^*; k_i) - F(\mathbf{x}_j^*; k_j)| \leq \delta$

where $\mathbf{x}_i^*$ represents the optimal first-stage decisions for scenario tree $k_i$ .

In other words: if I generate 20 different sets of scenarios using the same method, do I get roughly the same investment decisions each time?

Out-of-Sample Stability

How well do the investment decisions perform on the true underlying distribution?

For a fixed set of investment decisions $\mathbf{x}_i^*$ and different realizations of the true distribution $k$ :

$|F(\mathbf{x}_i^*; k) - F(\mathbf{x}_j^*; k)| < \delta$

This is much harder to test because we don’t know the “true” distribution — we only have historical data as an approximation. The approach: use a large number of randomly sampled scenarios as a proxy for the true distribution.

The Gap Between In-Sample and Out-of-Sample

A key insight: if the gap between in-sample and out-of-sample objective values is large, it suggests the scenario generation routine is biased. It’s optimizing for its own generated scenarios rather than the true underlying uncertainty.

Key Findings

The computational study tested two cases:

Case 1: All of Europe

31 countries
Maximum 10 scenarios per tree (memory constraints)
20 different scenario trees for each method

Case 2: Subset of Europe

Only Belgium, Germany, and France
Up to 200 scenarios tested
More detailed convergence analysis

Result 1: Random Sampling is Least Biased

For both in-sample and out-of-sample stability, the Random routine consistently showed the smallest gap between the two measures.

Both Moment-Matching and Moment-Load-Matching generated objective values where the gap between in-sample and out-of-sample tests deviated significantly more than the Random routine. This indicates bias — the moment-matching procedures were finding scenarios that looked good according to statistical moments but didn’t represent the true stochastic behavior as well.

Result 2: 50 Scenarios Appears Sufficient

For the full European case with 10 scenarios, the standard deviation across different scenario trees remained high. But for the reduced case (Germany, France, Belgium):

With 50 scenarios: relative standard deviation of 0.33%
With 100 scenarios: gap reduced to 0.01 × 10⁻¹¹
With 200 scenarios: gap of 0.01 × 10⁻¹¹

The convergence from 10 to 50 scenarios showed the largest improvement. Beyond 50 scenarios, the marginal benefit decreased significantly.

Result 3: Moment-Matching Can Deviate from True Distribution

Analysis of the generated scenarios revealed an unexpected finding: the “Univariate” moment-matching procedure (matching moments for each generator separately) performed significantly better than the “Multivariate” approach (matching moments for all generators together).

For hydro run-of-the-river, the Univariate procedure almost perfectly matched the seasonal data, while the Multivariate procedure sampled a subset of values concentrated around 0.9-0.95 capacity factor — clearly not representative of the full distribution.

Why does this matter? It highlights that moment-matching can be sensitive to how it’s implemented. Matching aggregate moments across all time-series can inadvertently create scenarios that are statistically “correct” according to the chosen moments but miss important features of the distribution.

Theoretical Insights: Why Random Sampling Works

Several factors contribute to random sampling’s superior performance:

1. The Curse of Dimensionality

Energy market models have high-dimensional stochastic variables: solar generation in 31 countries, wind onshore in 31 countries, wind offshore (where available), hydro run-of-the-river, and load — all varying hourly.

Matching the first four moments provides only a partial characterization of such high-dimensional distributions. Important features like correlations between generators and temporal dependencies aren’t fully captured by univariate moments.

2. Moments Are Not Dimension-Free

The thesis highlighted a subtle issue: the four moments have different dimensions:

Mean: same dimension as the variable
Variance: dimension squared
Skewness: dimension cubed
Kurtosis: dimension to the fourth power

When aggregating moments across different generators (solar, wind, hydro, load), you’re effectively weighting them differently because they aren’t equally scaled. This can bias the matching procedure.

Standardized moments (dividing skewness by $\sigma^3$ and kurtosis by $\sigma^4$ ) would solve this, but weren’t used in the thesis implementation.

3. The Bias-Variance Tradeoff

Moment-matching attempts to reduce sampling variability by enforcing statistical constraints. But this introduces bias if the constraints are misspecified or incomplete.

Random sampling has higher variance (different scenario trees give different results) but is unbiased — given enough scenarios, it converges to the true distribution.

The stability tests revealed that for this application, the bias introduced by moment-matching outweighed its variance reduction benefits.

Data Insights: Complementary Energy Sources

The thesis included extensive analysis of the historical data, revealing important patterns:

Seasonal Patterns

Wind offshore and onshore: Generate less power in summer months relative to winter. This makes intuitive sense — weather patterns drive both temperature and wind.

Solar: Strong daily seasonality. More power during summer months, with a clear peak around midday.

Hydro run-of-the-river: Generates slightly more power in the first half of the year. Less effective in September-November due to falling temperatures affecting water flow. However, hydro shows the highest average capacity factor but also the most volatile of all generators.

Anti-Correlation Between Wind and Solar

Studies have shown anti-correlation between aggregated wind energy and solar irradiation in different European regions (Bett and Thornton, 2016; Miglietta et al., 2017). The monthly aggregated data showed:

Wind onshore and offshore: less power during summer months
Solar: more power during summer months

This natural complementarity is crucial for energy planning — it suggests that a balanced portfolio of renewables can reduce overall variability.

Hourly vs. Monthly Aggregation

For hydro run-of-the-river in Finland, the data showed a significant increase during the day, likely due to:

Hydropower includes pondage (small reservoirs)
Electricity demand is higher during the day
Hourly seasonalities don’t apply equally to all countries

The aggregated data for all 31 countries showed that capacity factors differ significantly between countries but maintain similar patterns.

Practical Implications

For Energy Planners

Don’t over-optimize on historical moments: The temptation to match statistical properties of historical data can lead to biased scenarios that don’t represent future uncertainty well.
50 scenarios is a good starting point: The marginal benefit beyond 50 scenarios decreased significantly, making this a practical number for computational efficiency.
Random sampling is robust: While sophisticated methods like moment-matching are theoretically appealing, simpler random sampling proved more reliable for this high-dimensional problem.

For Model Developers

Test both in-sample and out-of-sample stability: Looking at only in-sample performance can be misleading if the scenario generation is biased.
Use standardized moments if matching: The thesis suggested that using standardized moments (dimension-free) might reduce the bias observed in moment-matching.
Consider hybrid approaches: Future work could combine random sampling (to avoid bias) with moment-matching on a subset of scenarios (to reduce variance).

The Value of Stochastic Solutions

One key theoretical concept explored in the thesis is the Value of the Stochastic Solution (VSS):

$\text{VSS} = z_{\text{stoch}}(x_{\text{det}}) - z_{\text{stoch}}(x_{\text{stoch}})$

This measures the difference between:

Using deterministic investment decisions in the stochastic program
Optimizing investment decisions for the stochastic program

The VSS is always non-negative (stochastic optimization can’t be worse than deterministic). But how large is it?

The thesis didn’t report specific VSS values, but the gap between in-sample and out-of-sample performance serves a similar purpose: it quantifies the cost of using biased scenarios.

For Moment-Matching and Moment-Load-Matching, the significant gaps indicated that these methods were essentially “overfitting” to their generated scenarios — similar to how a deterministic solution overfits to a single forecast.

Future Research Directions

The thesis concluded with several promising directions:

1. Hybrid Scenario Generation

Rather than choosing between random sampling and moment-matching, generate a mix: some scenarios randomly sampled, others following moment-matching constraints. This could combine the unbiasedness of random sampling with the variance reduction of moment-matching.

Interestingly, because the random routine already has two peak seasons in every scenario, it’s already partially a hybrid approach.

2. Temporal Aggregation

The computational challenge of using hourly data could be addressed by aggregating to daily, weekly, or monthly resolution. This would allow more scenarios to be generated without hitting memory constraints.

However, solar PV has strong daily seasonality, so daily aggregation might lose important features. The tradeoff between computational tractability and model fidelity needs careful consideration.

3. Alternative Moment Formulations

Instead of using central moments, use standardized moments to make all metrics dimension-free:

$\text{Standardized skewness} = \frac{E[(X - \mu)^3]}{\sigma^3}$

$\text{Standardized kurtosis} = \frac{E[(X - \mu)^4]}{\sigma^4}$

This could reduce the bias observed when comparing generators of different scales.

Alternatively, transform moments to have dimension one, then multiply by standard deviation to restore the original scaling in a more balanced way.

4. Incorporate Spatial and Temporal Correlations

The current moment-matching only considers univariate moments. Future work could match:

Cross-correlations between different generators
Autocorrelations within each time-series
Spatial correlations between neighboring countries

The challenge: the number of parameters to match grows rapidly, potentially making the optimization problem intractable.

Conclusion: Simplicity Over Sophistication

The overarching lesson from this thesis is that sophisticated doesn’t always mean better. In high-dimensional stochastic optimization, simple random sampling from historical data outperformed more complex moment-matching approaches.

This doesn’t mean moment-matching is fundamentally flawed — it has proven successful in other domains like portfolio optimization. But for the specific application of energy market modeling with:

High dimensionality (31 countries × multiple generators × hourly resolution)
Multiple stochastic variables with complex correlations
Limited computational budget

…the bias introduced by partial characterization (only four moments) outweighed the variance reduction benefits.

As Europe continues its transition to renewable energy, accurately modeling uncertainty becomes increasingly critical. Investment decisions made today will shape the energy system for decades. Getting the scenarios right — representing the true range of possible futures — is essential for robust, resilient energy planning.

The findings suggest that pragmatic approaches grounded in historical data, while less theoretically elegant, may be more reliable than optimization-based methods that risk overfitting to incomplete statistical characterizations.

This post summarizes my Master’s thesis on scenario generation for the EMPIRE energy market model, completed in Spring 2020 at NTNU’s Department of Industrial Economics and Technology Management.