# Running Steel Model Simulations Locally This guide explains how to run the steel model simulation during local development and how to integrate new data sources into the simulation pipeline. :::{only} public ## Quick Start (CLI) After installing the Steel Model package, run a simulation from the command line: ```bash run_simulation --start-year 2025 --end-year 2030 --output-dir ./simulation_outputs ``` Common options: - `--start-year` / `--end-year`: define the scenario horizon. - `--config-file`: load a saved configuration. - `--log-level`: control verbosity (`INFO`, `DEBUG`, etc.). The CLI writes metrics, logs, and artefacts to the chosen output directory. Review the [Configuration](configuration.md) guide for a comprehensive list of parameters and environment variables. ## Custom Data Overview To experiment with bespoke datasets: 1. Prepare files that conform to the schemas referenced in the configuration guide. 2. Point the CLI at your resources, for example: ```bash run_simulation \ --plants-json ./my_data/plants.json \ --demand-xlsx ./my_data/demand.xlsx \ --output-dir ./custom_run ``` 3. Inspect the generated reports (`metrics.json`, plots, logs) under your output directory. For notebook or service integrations, see the [Command-Line Entrypoints](commandline_entrypoints.md) reference. ::: :::{only} not public ## Prerequisites Before running simulations, ensure you have: - Python 3.13 installed (via `uv python install 3.13`) - Virtual environment activated (`source .venv/bin/activate`) - All dependencies installed (`uv sync`) ## Data Pipeline Architecture The steel model follows a structured data flow from raw inputs to simulation execution: ### 1. Data Storage & Caching - **S3 Storage**: Raw data packages (core-data, geo-data) are stored in S3 buckets - **Local Cache**: Downloaded data is cached in `$STEELO_HOME/data_cache/` to avoid repeated downloads - **Preparation Cache**: Processed data is cached in `$STEELO_HOME/preparation_cache/` based on master Excel content hash - **Django Models**: In web mode, `DataPackage` models store the zip archives in Django's media directory (not in `$STEELO_HOME`) ### 2. Data Transformation The system transforms raw input data through two parallel paths: **CLI Path:** - Raw data → Preprocessing → Files in `$STEELO_HOME/preparation_cache/prep_/data/` - Creates JSON repositories and processed CSV/Excel files - Symlinks created at `project_root/data/` for backward compatibility **Django Path:** - Raw data → `DataPackage` models → `DataPreparation` models - Stores processed data in Django's media directory ### 3. Configuration & Execution - A `SimulationConfig` object is created with pointers to all required data files - The config is passed to `SimulationRunner`, which distributes it to all modules - No downstream module needs to know about the original data sources ### Integrating New Data Sources (e.g., Master Excel) When adding new data sources like master Excel files, follow this pattern: 1. **Create an Adapter**: Write a transformation module in `src/steelo/adapters/` that: - Takes the path to your Excel file as input - Returns domain model instances as output - Example: `adapters/dataprocessing/master_excel_reader.py` 2. **Extend SimulationConfig**: Add fields for your new data to the `SimulationConfig` class 3. **Wire Through the System**: - Pass data via `SimulationConfig` → repositories or `bus.env` - Access in your module via event/command handlers 4. **Feature Flag**: Add a flag in `global_variables.py` (default `False`) to enable/disable your feature: ```python USE_MASTER_EXCEL = False # Enable when ready ``` This approach ensures your changes don't break existing functionality and can be easily replaced when the system officially adopts the master input file. ## Method 1: Programmatic Execution (Python/Notebook) The programmatic approach gives you full control over the simulation configuration and is ideal for: - Jupyter notebook analysis - Custom simulation scenarios - Integration with other Python tools - Batch processing ### Quick Example ```python from pathlib import Path from steelo.simulation import SimulationConfig from steelo.simulation_runner import create_simulation_runner from steelo.domain import Year config = SimulationConfig.from_data_directory( start_year=Year(2025), end_year=Year(2030), data_dir=Path("./data"), output_dir=Path("./test_outputs") ) runner = create_simulation_runner(config) results = runner.run() # Access results print(f"Final steel price: {results['price']}") print(f"Total production: {results['production']}") ``` ### Custom Paths Example ```python config = SimulationConfig( # Custom output paths output_dir=Path("./custom_outputs"), plots_dir=Path("./custom_outputs/plots"), # Custom input data plants_json_path=Path("./my_data/plants.json"), demand_center_xlsx=Path("./my_data/demand.xlsx"), cost_of_x_csv=Path("./my_data/cost_of_x.json"), # Time and parameters start_year=Year(2025), end_year=Year(2050), scrap_generation_scenario="high_recycling", ) ``` ### Technology Constraints Example ```python from steelo.simulation_types import get_default_technology_settings, TechnologySettings # Create technology settings with specific constraints tech_settings = get_default_technology_settings() # Ban blast furnaces by setting allowed=False tech_settings['BF'] = TechnologySettings( allowed=False, from_year=2025, to_year=None ) # Allow hydrogen DRI only from 2030 tech_settings['DRIH2'] = TechnologySettings( allowed=True, from_year=2030, to_year=None ) # Disable certain technologies tech_settings['ESF'] = TechnologySettings( allowed=False, from_year=2025, to_year=None ) tech_settings['MOE'] = TechnologySettings( allowed=False, from_year=2025, to_year=None ) config = SimulationConfig( start_year=Year(2025), end_year=Year(2040), technology_settings=tech_settings, ) ``` For more examples, see `examples/run_simulation_example.py`. ## Caching System The CLI implements a content-based caching system that significantly speeds up repeated simulations: ### How It Works 1. **Content Hashing**: The master Excel file is hashed using SHA256 to create a unique cache key 2. **Cache Storage**: Prepared data is stored in `$STEELO_HOME/preparation_cache/prep_/` 3. **Fast Lookups**: An index file tracks all cached preparations for instant lookups 4. **Automatic Reuse**: When running with the same master Excel, cached data is reused instantly ### Cache Management Commands ```bash # View cache statistics steelo-cache stats # List all cached preparations steelo-cache list # Clear all cached data steelo-cache clear # Clear old caches but keep recent ones run_simulation --cache-clear --keep-recent 3 # Force fresh preparation (bypass cache) run_simulation --force-refresh # Disable caching entirely run_simulation --no-cache ``` ### Cache Versioning The cache system includes automatic version tracking. When the code that processes data changes, old caches are automatically invalidated. This ensures you always get correctly processed data without manual intervention. If you encounter issues with outdated cached data: 1. The cache version is automatically bumped when processing code changes 2. Old caches are invalidated when detected 3. Use `--force-refresh` to bypass all caching if needed ### Directory Structure ``` $STEELO_HOME/ ├── preparation_cache/ │ ├── index.json # Fast lookup index │ ├── prep_a1b2c3d4/ # Cached preparation │ │ ├── data/ # Prepared data files │ │ │ └── fixtures/ # JSON repositories │ │ └── metadata.json # Cache metadata │ └── prep_e5f6g7h8/ # Another cached preparation ├── output/ # Simulation outputs │ ├── sim_20240726_143052/ # Timestamped simulation │ └── latest -> sim_20240726... # Symlink to latest ├── data -> preparation_cache/... # Symlink to latest preparation └── output_latest -> output/sim_... # Symlink to latest output ``` ### Backward Compatibility For backward compatibility with existing scripts, symlinks are automatically created: - `project_root/data/` → Latest cached preparation - `project_root/output/` → Latest simulation output If these directories already exist, they are backed up to `data_backup_` and `output_backup_`. ## Method 2: Command-Line Interface (CLI) The CLI approach is useful for automated runs, testing, and debugging. ### Quick Start For most cases, you only need one command: ```bash # Run the simulation (automatically prepares data if needed) run_simulation ``` The `run_simulation` command will automatically: - Download required data packages from S3 if not cached - Prepare all necessary data files - Use cached preparations when possible for faster startup - Run the actual simulation ### Getting Fresh Data If you need to force fresh data preparation (e.g., after fixing bugs or updating master Excel): ```bash # Method 1: Force refresh during simulation run_simulation --force-refresh # Method 2: Clear cache and run steelo-cache clear run_simulation # Method 3: Prepare data explicitly with force refresh steelo-data-prepare --force-refresh run_simulation ``` ### Advanced Usage #### Clearing Cache ```bash # Clear all caches (preparation cache and data cache) steelo-cache clear # Clear all caches but keep recent preparation caches steelo-cache clear --keep-recent 3 ``` **Note:** The `steelo-cache clear` command clears the preparation cache, downloaded data packages cache, and the `data/` directory to ensure a completely fresh state. #### Using Development Geo Data ```bash # Use specific geo-data version via command line steelo-data-prepare --geo-version 1.1.0-dev # Or set via environment variable export STEELO_GEO_VERSION=1.1.0-dev steelo-data-prepare ``` #### Manual Data Management (Advanced) **Note**: Manual data management is rarely needed. The `run_simulation` command handles all data preparation automatically. For debugging or special cases requiring control over individual steps: ```bash # Download specific packages steelo-data-download --package core-data steelo-data-download --package geo-data # Prepare data with specific options steelo-data-prepare --force-refresh # Extract geo data separately steelo-data-extract-geo # Recreate JSON repositories steelo-data-recreate --package core-data --output-dir ./data/repositories ``` ### Step 2: Run the Simulation Once data preparation is complete, start the simulation: ```bash # Run simulation with default settings run_simulation # Run with custom output directory run_simulation --output-dir ./my_simulation_outputs # Run with custom parameters and redirect log run_simulation --start-year 2025 --end-year 2035 --output-dir ./outputs > /tmp/simulation.log 2>&1 ``` #### CLI Options **Simulation Parameters:** - `--start-year`: Starting year for simulation (default: 2025) - `--end-year`: Ending year for simulation (default: 2050) - `--output-dir`: Base output directory for results (default: $STEELO_HOME/output) - `--log-level`: Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL; default: WARNING) **Data Files (usually handled automatically via caching):** - `--plants-json`: Path to plants JSON file - `--demand-excel`: Path to demand Excel file - `--location-csv`: Path to location CSV file - `--cost-of-x-csv`: Path to cost of x JSON file **Caching Options:** - `--cache-stats`: Show cache statistics and exit - `--cache-list`: List all cached preparations and exit - `--cache-clear`: Clear cache (use with --keep-recent N to keep some) - `--force-refresh`: Force fresh data preparation (bypass cache) - `--no-cache`: Disable caching for this run ### Step 3: Monitor Progress In a separate terminal, monitor the simulation progress: ```bash # Watch the log file in real-time tail -f /tmp/simulation.log ``` The simulation will output progress updates, including: - Current simulation year - Plant capacity changes - Technology transitions - Trade allocations - Cost calculations ## Method 3: Django Web Interface The web interface provides a user-friendly way to configure and run simulations with real-time progress tracking. ### Quick Start ```bash # Initial setup (only once) uv run src/django/manage.py migrate # Prepare data uv run src/django/manage.py prepare_default_data # Start services uv run src/django/manage.py runserver uv run src/django/manage.py db_worker # in separate terminal ``` ### Detailed Steps #### Step 1: Create the Database ```bash uv run src/django/manage.py migrate ``` #### Step 2: Prepare Default Data Prepare the data files needed for simulations: ```bash # Standard preparation uv run src/django/manage.py prepare_default_data # Use development geo data uv run src/django/manage.py prepare_default_data --geo-version 1.1.0-dev # Or via environment variable export STEELO_GEO_VERSION=1.1.0-dev uv run src/django/manage.py prepare_default_data ``` This command will: - Download the master-input Excel file from S3 - Download core-data and geo-data packages from S3 - Extract data from the master Excel file - Copy files from core-data package - Generate derived files (like plant_groups.json) - Extract geo-data files - Create all fixture files in `data/fixtures/` **Options:** - `--name`: Name for the data preparation (default: "Default Data") - `--force`: Force re-preparation even if data exists - `--geo-version`: Specific version of geo-data to use (e.g., '1.1.0-dev') - `--master-excel-id`: ID of a MasterExcelFile to use (if you've uploaded one) - `--quiet`: Hide detailed output (only show summary) - `--no-check-files`: Skip file existence checking Note: The master Excel file is now mandatory for data preparation. The command uses a centralized data preparation service that ensures consistent file tracking across all data preparation methods. #### Step 3: Start the Django Development Server ```bash # Start the web server on http://localhost:8000 uv run src/django/manage.py runserver ``` #### Step 4: Start the Background Worker In a separate terminal, start the task worker that handles simulation execution: ```bash # Start the background worker for running simulations uv run src/django/manage.py db_worker ``` The worker ensures the web interface remains responsive during long-running simulations. #### Step 5: Create and Run a Simulation 1. Open your browser and navigate to http://localhost:8000 2. Click "New Simulation" to create a new model run 3. Configure simulation parameters: - Set start and end years - Choose scenarios (demand, scrap generation) - Configure technology availability - Set economic parameters 4. Click "Create Model Run" 5. On the model run detail page, click "Run Simulation" 6. Monitor progress in real-time on the web interface ### Managing Data Packages When updating geo-data or core-data packages (e.g., upgrading geo-data.zip to a new version), you may need to clean up old DataPreparation and DataPackage objects from the database. #### Option 1: Using Django Shell ```bash # Open the Django shell uv run src/django/manage.py shell # In the shell, remove old data packages from steeloweb.models import DataPackage, DataPreparation # Delete all old data preparations DataPreparation.objects.all().delete() # Delete all old data packages DataPackage.objects.all().delete() # Exit the shell exit() ``` #### Option 2: Using the Management Command A `cleanup_data_packages` management command is available for cleaning up old data packages and their associated files: ```bash # Delete all data packages and preparations (including files) uv run src/django/manage.py cleanup_data_packages # Keep only the latest versions of each package type uv run src/django/manage.py cleanup_data_packages --keep-latest # Preview what would be deleted without actually deleting uv run src/django/manage.py cleanup_data_packages --dry-run # Delete database records only, keep files in media directory uv run src/django/manage.py cleanup_data_packages --keep-files ``` The command options: - `--keep-latest`: Keeps the most recent version of each package type while removing older versions - `--dry-run`: Shows what would be deleted without making any changes - `--keep-files`: Removes database records but preserves the actual data files in the media directory After cleaning up, run `prepare_default_data` again to download the latest versions. ## Output Files Both methods generate output files in the `outputs/` directory: - **CSV files**: Detailed simulation results in `outputs/TM/` - **Plots**: Visualization charts in `outputs/plots/` - Cost curves - Capacity development - Trade flows - Geographic distributions ## Troubleshooting ### Common Issues 1. **"No data preparations available" error** - Run `uv run src/django/manage.py prepare_default_data` first - Check that S3 credentials are configured if using private buckets 2. **Empty plants.json file (0 plants)** - This usually indicates cached data from before a bug fix - Solution: Force fresh data preparation ```bash steelo-cache clear run_simulation --force-refresh ``` - The cache system now includes version tracking to prevent this 3. **Simulation hangs or crashes** - Check available memory (simulations can be memory-intensive) - Examine logs for specific error messages - Ensure all required data files are present 4. **Missing plots or visualizations** - Verify that geo-data was properly extracted - Check that matplotlib backend is configured correctly - Look for errors in the simulation log ### Debugging Tips - Use `--log-level DEBUG` flag with CLI commands for verbose output - Check Django logs in the terminal running `runserver` - Examine background worker output for task execution details - Review generated CSV files for intermediate results ## Configuration ### Environment Variables Key environment variables that affect simulation behavior: - `STEELO_HOME`: Base directory for steelo data (default: `~/.steelo`) - Contains: `preparation_cache/`, `output/`, `data_cache/` - All simulation outputs and caches are stored here - `DEVELOPMENT`: Set to `true` for development mode - `MPLBACKEND`: Matplotlib backend (set to `Agg` for headless environments) ### Simulation Parameters Key parameters you can configure: - **Time Period**: Start and end years for the simulation - **Technology Constraints**: Which technologies are allowed and when - **Economic Factors**: Carbon tax, capital costs, trade scenarios - **Geographic Constraints**: Land use, infrastructure availability :::