Running Steel Model Simulations Locally¶

This guide explains how to run the steel model simulation during local development and how to integrate new data sources into the simulation pipeline.

Quick Start (CLI)¶

After installing the Steel Model package, run a simulation from the command line:

run_simulation --start-year 2025 --end-year 2030 --output-dir ./simulation_outputs

Common options:

--start-year / --end-year: define the scenario horizon.
--config-file: load a saved configuration.
--log-level: control verbosity (INFO, DEBUG, etc.).

The CLI writes metrics, logs, and artefacts to the chosen output directory. Review the Configuration guide for a comprehensive list of parameters and environment variables.

Custom Data Overview¶

To experiment with bespoke datasets:

Prepare files that conform to the schemas referenced in the configuration guide.

Point the CLI at your resources, for example:

run_simulation \
  --plants-json ./my_data/plants.json \
  --demand-xlsx ./my_data/demand.xlsx \
  --output-dir ./custom_run

Inspect the generated reports (metrics.json, plots, logs) under your output directory.

For notebook or service integrations, see the Command-Line Entrypoints reference.

Prerequisites¶

Before running simulations, ensure you have:

Python 3.13 installed (via uv python install 3.13)
Virtual environment activated (source .venv/bin/activate)
All dependencies installed (uv sync)

Data Pipeline Architecture¶

The steel model follows a structured data flow from raw inputs to simulation execution:

1. Data Storage & Caching¶

S3 Storage: Raw data packages (core-data, geo-data) are stored in S3 buckets
Local Cache: Downloaded data is cached in $STEELO_HOME/data_cache/ to avoid repeated downloads
Preparation Cache: Processed data is cached in $STEELO_HOME/preparation_cache/ based on master Excel content hash
Django Models: In web mode, DataPackage models store the zip archives in Django’s media directory (not in $STEELO_HOME)

2. Data Transformation¶

The system transforms raw input data through two parallel paths:

CLI Path:

Raw data → Preprocessing → Files in $STEELO_HOME/preparation_cache/prep_<hash>/data/
Creates JSON repositories and processed CSV/Excel files
Symlinks created at project_root/data/ for backward compatibility

Django Path:

Raw data → DataPackage models → DataPreparation models
Stores processed data in Django’s media directory

3. Configuration & Execution¶

A SimulationConfig object is created with pointers to all required data files
The config is passed to SimulationRunner, which distributes it to all modules
No downstream module needs to know about the original data sources

Integrating New Data Sources (e.g., Master Excel)¶

When adding new data sources like master Excel files, follow this pattern:

Create an Adapter: Write a transformation module in src/steelo/adapters/ that:
- Takes the path to your Excel file as input
- Returns domain model instances as output
- Example: adapters/dataprocessing/master_excel_reader.py
Extend SimulationConfig: Add fields for your new data to the SimulationConfig class
Wire Through the System:
- Pass data via SimulationConfig → repositories or bus.env
- Access in your module via event/command handlers
Feature Flag: Add a flag in global_variables.py (default False) to enable/disable your feature:
```
USE_MASTER_EXCEL = False  # Enable when ready
```

This approach ensures your changes don’t break existing functionality and can be easily replaced when the system officially adopts the master input file.

Method 1: Programmatic Execution (Python/Notebook)¶

The programmatic approach gives you full control over the simulation configuration and is ideal for:

Jupyter notebook analysis
Custom simulation scenarios
Integration with other Python tools
Batch processing

Quick Example¶

from pathlib import Path
from steelo.simulation import SimulationConfig
from steelo.simulation_runner import create_simulation_runner
from steelo.domain import Year

config = SimulationConfig.from_data_directory(
    start_year=Year(2025),
    end_year=Year(2030),
    data_dir=Path("./data"),
    output_dir=Path("./test_outputs")
)

runner = create_simulation_runner(config)
results = runner.run()

# Access results
print(f"Final steel price: {results['price']}")
print(f"Total production: {results['production']}")

Custom Paths Example¶

config = SimulationConfig(
    # Custom output paths
    output_dir=Path("./custom_outputs"),
    plots_dir=Path("./custom_outputs/plots"),
    
    # Custom input data
    plants_json_path=Path("./my_data/plants.json"),
    demand_center_xlsx=Path("./my_data/demand.xlsx"),
    cost_of_x_csv=Path("./my_data/cost_of_x.json"),
    
    # Time and parameters
    start_year=Year(2025),
    end_year=Year(2050),
    scrap_generation_scenario="high_recycling",
)

Technology Constraints Example¶

from steelo.simulation_types import get_default_technology_settings, TechnologySettings

# Create technology settings with specific constraints
tech_settings = get_default_technology_settings()

# Ban blast furnaces by setting allowed=False
tech_settings['BF'] = TechnologySettings(
    allowed=False,
    from_year=2025,
    to_year=None
)

# Allow hydrogen DRI only from 2030
tech_settings['DRIH2'] = TechnologySettings(
    allowed=True,
    from_year=2030,
    to_year=None
)

# Disable certain technologies
tech_settings['ESF'] = TechnologySettings(
    allowed=False,
    from_year=2025,
    to_year=None
)
tech_settings['MOE'] = TechnologySettings(
    allowed=False,
    from_year=2025,
    to_year=None
)

config = SimulationConfig(
    start_year=Year(2025),
    end_year=Year(2040),
    technology_settings=tech_settings,
)

For more examples, see examples/run_simulation_example.py.

Caching System¶

The CLI implements a content-based caching system that significantly speeds up repeated simulations:

How It Works¶

Content Hashing: The master Excel file is hashed using SHA256 to create a unique cache key
Cache Storage: Prepared data is stored in $STEELO_HOME/preparation_cache/prep_<hash>/
Fast Lookups: An index file tracks all cached preparations for instant lookups
Automatic Reuse: When running with the same master Excel, cached data is reused instantly

Cache Management Commands¶

# View cache statistics
steelo-cache stats

# List all cached preparations
steelo-cache list

# Clear all cached data
steelo-cache clear

# Clear old caches but keep recent ones
run_simulation --cache-clear --keep-recent 3

# Force fresh preparation (bypass cache)
run_simulation --force-refresh

# Disable caching entirely
run_simulation --no-cache

Cache Versioning¶

The cache system includes automatic version tracking. When the code that processes data changes, old caches are automatically invalidated. This ensures you always get correctly processed data without manual intervention.

If you encounter issues with outdated cached data:

The cache version is automatically bumped when processing code changes
Old caches are invalidated when detected
Use --force-refresh to bypass all caching if needed

Directory Structure¶

$STEELO_HOME/
├── preparation_cache/
│   ├── index.json                    # Fast lookup index
│   ├── prep_a1b2c3d4/               # Cached preparation
│   │   ├── data/                    # Prepared data files
│   │   │   └── fixtures/            # JSON repositories
│   │   └── metadata.json            # Cache metadata
│   └── prep_e5f6g7h8/               # Another cached preparation
├── output/                          # Simulation outputs
│   ├── sim_20240726_143052/        # Timestamped simulation
│   └── latest -> sim_20240726...   # Symlink to latest
├── data -> preparation_cache/...    # Symlink to latest preparation
└── output_latest -> output/sim_...  # Symlink to latest output

Backward Compatibility¶

For backward compatibility with existing scripts, symlinks are automatically created:

project_root/data/ → Latest cached preparation
project_root/output/ → Latest simulation output

If these directories already exist, they are backed up to data_backup_<timestamp> and output_backup_<timestamp>.

Method 2: Command-Line Interface (CLI)¶

The CLI approach is useful for automated runs, testing, and debugging.

Quick Start¶

For most cases, you only need one command:

# Run the simulation (automatically prepares data if needed)
run_simulation

The run_simulation command will automatically:

Download required data packages from S3 if not cached
Prepare all necessary data files
Use cached preparations when possible for faster startup
Run the actual simulation

Getting Fresh Data¶

If you need to force fresh data preparation (e.g., after fixing bugs or updating master Excel):

# Method 1: Force refresh during simulation
run_simulation --force-refresh

# Method 2: Clear cache and run
steelo-cache clear
run_simulation

# Method 3: Prepare data explicitly with force refresh
steelo-data-prepare --force-refresh
run_simulation

Advanced Usage¶

Clearing Cache¶

# Clear all caches (preparation cache and data cache)
steelo-cache clear

# Clear all caches but keep recent preparation caches
steelo-cache clear --keep-recent 3

Note: The steelo-cache clear command clears the preparation cache, downloaded data packages cache, and the data/ directory to ensure a completely fresh state.

Using Development Geo Data¶

# Use specific geo-data version via command line
steelo-data-prepare --geo-version 1.1.0-dev

# Or set via environment variable
export STEELO_GEO_VERSION=1.1.0-dev
steelo-data-prepare

Manual Data Management (Advanced)¶

Note: Manual data management is rarely needed. The run_simulation command handles all data preparation automatically.

For debugging or special cases requiring control over individual steps:

# Download specific packages
steelo-data-download --package core-data
steelo-data-download --package geo-data

# Prepare data with specific options
steelo-data-prepare --force-refresh

# Extract geo data separately
steelo-data-extract-geo

# Recreate JSON repositories
steelo-data-recreate --package core-data --output-dir ./data/repositories

Step 2: Run the Simulation¶

Once data preparation is complete, start the simulation:

# Run simulation with default settings
run_simulation

# Run with custom output directory
run_simulation --output-dir ./my_simulation_outputs

# Run with custom parameters and redirect log
run_simulation --start-year 2025 --end-year 2035 --output-dir ./outputs > /tmp/simulation.log 2>&1

CLI Options¶

Simulation Parameters:

--start-year: Starting year for simulation (default: 2025)
--end-year: Ending year for simulation (default: 2050)
--output-dir: Base output directory for results (default: $STEELO_HOME/output)
--log-level: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL; default: WARNING)

Data Files (usually handled automatically via caching):

--plants-json: Path to plants JSON file
--demand-excel: Path to demand Excel file
--location-csv: Path to location CSV file
--cost-of-x-csv: Path to cost of x JSON file

Caching Options:

--cache-stats: Show cache statistics and exit
--cache-list: List all cached preparations and exit
--cache-clear: Clear cache (use with –keep-recent N to keep some)
--force-refresh: Force fresh data preparation (bypass cache)
--no-cache: Disable caching for this run

Step 3: Monitor Progress¶

In a separate terminal, monitor the simulation progress:

# Watch the log file in real-time
tail -f /tmp/simulation.log

The simulation will output progress updates, including:

Current simulation year
Plant capacity changes
Technology transitions
Trade allocations
Cost calculations

Method 3: Django Web Interface¶

The web interface provides a user-friendly way to configure and run simulations with real-time progress tracking.

Quick Start¶

# Initial setup (only once)
uv run src/django/manage.py migrate

# Prepare data
uv run src/django/manage.py prepare_default_data

# Start services
uv run src/django/manage.py runserver
uv run src/django/manage.py db_worker  # in separate terminal

Detailed Steps¶

Step 1: Create the Database¶

uv run src/django/manage.py migrate

Step 2: Prepare Default Data¶

Prepare the data files needed for simulations:

# Standard preparation
uv run src/django/manage.py prepare_default_data

# Use development geo data
uv run src/django/manage.py prepare_default_data --geo-version 1.1.0-dev

# Or via environment variable
export STEELO_GEO_VERSION=1.1.0-dev
uv run src/django/manage.py prepare_default_data

This command will:

Download the master-input Excel file from S3
Download core-data and geo-data packages from S3
Extract data from the master Excel file
Copy files from core-data package
Generate derived files (like plant_groups.json)
Extract geo-data files
Create all fixture files in data/fixtures/

Options:

--name: Name for the data preparation (default: “Default Data”)
--force: Force re-preparation even if data exists
--geo-version: Specific version of geo-data to use (e.g., ‘1.1.0-dev’)
--master-excel-id: ID of a MasterExcelFile to use (if you’ve uploaded one)
--quiet: Hide detailed output (only show summary)
--no-check-files: Skip file existence checking

Note: The master Excel file is now mandatory for data preparation. The command uses a centralized data preparation service that ensures consistent file tracking across all data preparation methods.

Step 3: Start the Django Development Server¶

# Start the web server on http://localhost:8000
uv run src/django/manage.py runserver

Step 4: Start the Background Worker¶

In a separate terminal, start the task worker that handles simulation execution:

# Start the background worker for running simulations
uv run src/django/manage.py db_worker

The worker ensures the web interface remains responsive during long-running simulations.

Step 5: Create and Run a Simulation¶

Open your browser and navigate to http://localhost:8000
Click “New Simulation” to create a new model run
Configure simulation parameters:
- Set start and end years
- Choose scenarios (demand, scrap generation)
- Configure technology availability
- Set economic parameters
Click “Create Model Run”
On the model run detail page, click “Run Simulation”
Monitor progress in real-time on the web interface

Managing Data Packages¶

When updating geo-data or core-data packages (e.g., upgrading geo-data.zip to a new version), you may need to clean up old DataPreparation and DataPackage objects from the database.

Option 1: Using Django Shell¶

# Open the Django shell
uv run src/django/manage.py shell

# In the shell, remove old data packages
from steeloweb.models import DataPackage, DataPreparation

# Delete all old data preparations
DataPreparation.objects.all().delete()

# Delete all old data packages
DataPackage.objects.all().delete()

# Exit the shell
exit()

Option 2: Using the Management Command¶

A cleanup_data_packages management command is available for cleaning up old data packages and their associated files:

# Delete all data packages and preparations (including files)
uv run src/django/manage.py cleanup_data_packages

# Keep only the latest versions of each package type
uv run src/django/manage.py cleanup_data_packages --keep-latest

# Preview what would be deleted without actually deleting
uv run src/django/manage.py cleanup_data_packages --dry-run

# Delete database records only, keep files in media directory
uv run src/django/manage.py cleanup_data_packages --keep-files

The command options:

--keep-latest: Keeps the most recent version of each package type while removing older versions
--dry-run: Shows what would be deleted without making any changes
--keep-files: Removes database records but preserves the actual data files in the media directory

After cleaning up, run prepare_default_data again to download the latest versions.

Output Files¶

Both methods generate output files in the outputs/ directory:

CSV files: Detailed simulation results in outputs/TM/
Plots: Visualization charts in outputs/plots/
- Cost curves
- Capacity development
- Trade flows
- Geographic distributions

Troubleshooting¶

Common Issues¶

“No data preparations available” error
- Run uv run src/django/manage.py prepare_default_data first
- Check that S3 credentials are configured if using private buckets
Empty plants.json file (0 plants)
- This usually indicates cached data from before a bug fix
- Solution: Force fresh data preparation
```
steelo-cache clear
run_simulation --force-refresh
```
- The cache system now includes version tracking to prevent this
Simulation hangs or crashes
- Check available memory (simulations can be memory-intensive)
- Examine logs for specific error messages
- Ensure all required data files are present
Missing plots or visualizations
- Verify that geo-data was properly extracted
- Check that matplotlib backend is configured correctly
- Look for errors in the simulation log

Debugging Tips¶

Use --log-level DEBUG flag with CLI commands for verbose output
Check Django logs in the terminal running runserver
Examine background worker output for task execution details
Review generated CSV files for intermediate results

Configuration¶

Environment Variables¶

Key environment variables that affect simulation behavior:

STEELO_HOME: Base directory for steelo data (default: ~/.steelo)
- Contains: preparation_cache/, output/, data_cache/
- All simulation outputs and caches are stored here
DEVELOPMENT: Set to true for development mode
MPLBACKEND: Matplotlib backend (set to Agg for headless environments)

Simulation Parameters¶

Key parameters you can configure:

Time Period: Start and end years for the simulation
Technology Constraints: Which technologies are allowed and when
Economic Factors: Carbon tax, capital costs, trade scenarios
Geographic Constraints: Land use, infrastructure availability