Skip to content

GloBI Architecture - Detailed System Design#

This diagram provides a comprehensive view of the GloBI system architecture, including all major components, data flows, and external dependencies.

%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#4a9eff','primaryTextColor':'#000','primaryBorderColor':'#2563eb','lineColor':'#64748b','secondaryColor':'#fbbf24','tertiaryColor':'#34d399','noteBkgColor':'#fef3c7','noteTextColor':'#000','noteBorderColor':'#f59e0b'}}}%%
flowchart TD
    %% User Inputs
    subgraph INPUTS["System Inputs"]
        M[Manifest File<br/>GloBIExperimentSpec]
        GIS[GIS Building Data<br/>Shapefile/GeoJSON/GPKG]
        CDB[Component Database<br/>SQLite via Prisma]
        SF[Semantic Fields<br/>YAML]
        CM[Component Map<br/>YAML]
        EPW[Weather Files<br/>EPW Archive]
    end

    %% CLI Layer
    subgraph CLI["CLI Layer (main.py)"]
        CLI1[submit manifest]
        CLI2[simulate]
        CLI3[get experiment]
        CLI4[ui]
        CLI5[output_viz]
    end

    %% Configuration Layer
    subgraph CONFIG["Configuration Layer (models/configs.py)"]
        EXP[GloBIExperimentSpec]
        FC[FileConfig]
        GPC[GISPreprocessorConfig]
        HDC[HourlyDataConfig]
    end

    %% GIS Processing
    subgraph GISPROCESS["GIS Preprocessing (pipelines.preprocess_gis_file)"]
        direction TB
        GP1[Load GIS File<br/>GeoPandas]
        GP2[Validate & Reproject CRS]
        GP3[Rename Columns<br/>Handle Shapefile Limits]
        GP4[Validate Semantic Fields]
        GP5[Generate/Validate IDs]
        GP6[Extract Coordinates]
        GP7[Filter by Height/Floors]
        GP8[Filter by WWR]
        GP9[Filter by Basement/Attic]
        GP10[Validate Geometry]
        GP11[Create Rotated Rectangles<br/>gis/geometry.py]
        GP12[Filter by Area & Edge Length]
        GP13[Compute Neighbor Indices]
        GP14[Extract Neighbor Geometries]
        GP15[Inject Semantic Context]
        GP16[Assign Weather Files<br/>gis/weather.py]

        GP1 --> GP2 --> GP3 --> GP4 --> GP5
        GP5 --> GP6 --> GP7 --> GP8 --> GP9
        GP9 --> GP10 --> GP11 --> GP12 --> GP13
        GP13 --> GP14 --> GP15 --> GP16
    end

    %% Allocation
    subgraph ALLOC["Allocation Layer (allocate.py)"]
        direction TB
        A1[For Each Building Row:<br/>Create GloBIBuildingSpec]
        A2[Calculate Branching Factor<br/>Based on Payload Size]
        A3[Create BaseExperiment<br/>with simulate_globi_building]
        A4[Configure RecursionMap<br/>Distribution Strategy]
        A5[Submit to Hatchet<br/>experiment.allocate]
    end

    %% Distributed Computing
    subgraph DIST["Distributed Computing Infrastructure"]
        direction TB
        H[Hatchet Workflow<br/>Orchestrator]
        W1[Docker Worker 1]
        W2[Docker Worker 2]
        WN[Docker Worker N]
        S3[S3/Cloud Storage]

        H --> W1
        H --> W2
        H --> WN
    end

    %% Simulation
    subgraph SIM["Energy Simulation (pipelines.simulate_globi_building)"]
        direction TB
        S1[Receive GloBIBuildingSpec]
        S2[Construct Zone Definition<br/>from Semantic Fields]
        S3[Build EnergyPlus Model<br/>epinterface.sbem]
        S4[Validate Conditioned Areas]
        S5[Run EnergyPlus Simulation<br/>model.run]
        S6[Extract Results from SQL<br/>Monthly Energy & Peak]
        S7[Extract Hourly Data<br/>Optional Timeseries]
        S8[Create GloBIOutputSpec<br/>DataFrames + Metadata]

        S1 --> S2 --> S3 --> S4 --> S5
        S5 --> S6 --> S7 --> S8
    end

    %% Results Aggregation
    subgraph AGG["Results Aggregation (Scythe Framework)"]
        direction TB
        R1[Collect Outputs from Workers]
        R2[Aggregate DataFrames<br/>Results + HourlyData]
        R3[Apply Semantic Versioning]
        R4[Store in Cloud<br/>Parquet Format]
    end

    %% Output
    subgraph OUTPUT["Output Layer"]
        direction TB
        O1[Download from S3<br/>to Local Directory]
        O2[Results.parquet<br/>Monthly Energy Data]
        O3[HourlyData.parquet<br/>Timeseries Optional]
        O4[Generate Visualization<br/>D3 Dashboard]
        O5[CSV Exports]
    end

    %% External Dependencies
    subgraph EXT["External Dependencies"]
        EPL[EnergyPlus<br/>Simulation Engine]
        EPI[EPInterface<br/>IDF Generation]
        ARC[Archetypal<br/>Building Templates]
        SCY[Scythe<br/>Distributed Framework]
        PRIS[Prisma<br/>Database ORM]
    end

    %% Data Flow Connections
    M --> CLI1
    GIS --> CLI1
    CDB --> CLI1
    SF --> CLI1
    CM --> CLI1
    EPW --> CLI1

    CLI1 --> EXP
    EXP --> FC
    EXP --> GPC
    EXP --> HDC

    FC --> GISPROCESS
    GPC --> GISPROCESS
    GIS --> GP1
    SF --> GP4
    EPW --> GP16

    GP16 --> A1
    A1 --> A2
    A2 --> A3
    A3 --> A4
    A4 --> A5
    A5 --> H

    W1 --> S1
    W2 --> S1
    WN --> S1

    CDB --> S3
    S8 --> R1

    R1 --> R2
    R2 --> R3
    R3 --> R4
    R4 --> S3

    CLI3 --> O1
    S3 --> O1
    O1 --> O2
    O1 --> O3
    O2 --> O4
    O2 --> O5

    CLI5 --> O4

    %% External dependency connections
    S5 -.uses.-> EPL
    S3 -.uses.-> EPI
    S3 -.uses.-> ARC
    A5 -.uses.-> SCY
    S3 -.uses.-> PRIS

    %% Styling - using medium-toned colors for better contrast
    style INPUTS fill:#60a5fa,stroke:#2563eb,stroke-width:3px,color:#000
    style CLI fill:#d1d5db,stroke:#6b7280,stroke-width:3px,color:#000
    style CONFIG fill:#fcd34d,stroke:#f59e0b,stroke-width:3px,color:#000
    style GISPROCESS fill:#fcd34d,stroke:#f59e0b,stroke-width:3px,color:#000
    style ALLOC fill:#fca5a5,stroke:#dc2626,stroke-width:3px,color:#000
    style DIST fill:#fca5a5,stroke:#dc2626,stroke-width:3px,color:#000
    style SIM fill:#fca5a5,stroke:#dc2626,stroke-width:3px,color:#000
    style AGG fill:#4ade80,stroke:#16a34a,stroke-width:3px,color:#000
    style OUTPUT fill:#4ade80,stroke:#16a34a,stroke-width:3px,color:#000
    style EXT fill:#d8b4fe,stroke:#9333ea,stroke-width:3px,color:#000

Component Details#

System Inputs#

Manifest File (GloBIExperimentSpec)#

  • Experiment name and scenario identifier
  • File paths configuration
  • GIS preprocessor parameters (thresholds, defaults, CRS)
  • Hourly data extraction settings (optional)

GIS Building Data#

  • Building footprints as polygons (Shapefile/GeoJSON/GeoPackage)
  • Properties: height, number of floors, typology, age, region
  • Coordinate reference system (CRS) information

Component Database#

  • SQLite database accessed via Prisma ORM
  • Building components: walls, windows, roofs, floors
  • Material properties and thermal characteristics
  • Accessed during simulation to construct energy models

Semantic Fields & Component Map#

  • YAML files defining categorical building attributes
  • Maps building typologies to component selections
  • Examples: residential/commercial, construction era, climate zone

Weather Data#

  • EPW (EnergyPlus Weather) files or archive
  • Can be queried dynamically based on building location
  • Provides hourly climate data for simulation

CLI Layer#

The command-line interface provides user-facing commands:

  • submit manifest: Load experiment configuration and initiate preprocessing/allocation
  • simulate: Run single building simulation (testing/debugging)
  • get experiment: Retrieve results from cloud storage
  • ui: Launch Streamlit web interface for interactive exploration
  • output_viz: Generate D3 visualization dashboards from results

Configuration Layer#

GloBIExperimentSpec#

  • Top-level experiment configuration
  • Links to FileConfig, GISPreprocessorConfig, HourlyDataConfig
  • Supports manifest loading from YAML files

FileConfig#

  • Paths to all required input files
  • File validation and existence checks

GISPreprocessorConfig#

  • Geometric filtering thresholds (min/max area, edge length)
  • Default values (height, WWR, basement, attic)
  • CRS projection settings
  • Weather query parameters

HourlyDataConfig#

  • Variables to extract from EnergyPlus SQL output
  • Enables optional hourly timeseries capture

GIS Preprocessing Pipeline#

The preprocessing pipeline transforms raw GIS data into simulation-ready building specifications:

  1. Load & Validate: Read GIS file into GeoDataFrame, validate schema
  2. Reproject: Convert to Cartesian CRS for geometric operations
  3. Column Mapping: Handle Shapefile 10-character column name limits
  4. Semantic Validation: Ensure semantic fields exist in GIS data
  5. ID Handling: Generate UUIDs for buildings without IDs
  6. Coordinate Extraction: Extract latitude/longitude for weather queries
  7. Property Filters: Filter by height, floors, WWR, basement, attic
  8. Geometry Processing:
  9. Remove invalid geometries (non-polygons, self-intersections)
  10. Convert to rotated rectangles (gis/geometry.py)
  11. Filter by minimum/maximum building area
  12. Filter by minimum/maximum edge length
  13. Neighbor Analysis: Identify adjacent buildings for shading calculations
  14. Semantic Context: Inject building typology, age, region metadata
  15. Weather Assignment: Match buildings to EPW files by location

Output: Clean GeoDataFrame with enriched building data and column mappings


Allocation Layer#

Prepares building specifications for distributed execution:

  1. Spec Generation: Create GloBIBuildingSpec for each building row
  2. Extract geometry (rotated rectangle, neighbors)
  3. Extract properties (height, floors, WWR, basement, attic)
  4. Link semantic context and weather file

  5. Branching Factor Calculation:

  6. Sample 1000 random specs
  7. Measure average JSON payload size
  8. Calculate: sims_per_branch = 3MB / avg_size
  9. Determine: branches_required = total_specs / sims_per_branch

  10. Job Submission:

  11. Create BaseExperiment with simulate_globi_building function
  12. Configure RecursionMap for distribution strategy
  13. Submit to Hatchet with S3 client for result storage

Output: Hatchet job reference and run metadata


Distributed Computing Infrastructure#

Hatchet Workflow Orchestrator#

  • Receives job submissions from allocation layer
  • Manages task queue and worker assignment
  • Handles retries and error recovery
  • Tracks job progress and completion status

Docker Workers#

  • Containerized execution environments
  • Pre-configured with EnergyPlus, EPInterface, dependencies
  • Scale horizontally based on workload
  • Stream results to cloud storage via Scythe framework

S3/Cloud Storage#

  • Stores simulation results as Parquet files
  • Versions experiments using semantic versioning
  • Provides durable storage for large-scale experiments

Energy Simulation Pipeline#

Each worker executes the following for assigned buildings:

  1. Receive Spec: Deserialize GloBIBuildingSpec from JSON
  2. Zone Definition: Construct building zones from semantic fields and component map
  3. Model Construction: Use EPInterface/Archetypal to build EnergyPlus IDF
  4. Validation: Check conditioned floor areas match geometry
  5. Simulation: Run EnergyPlus simulation via model.run()
  6. Results Extraction:
  7. Query SQL output for monthly energy and peak results
  8. Create MultiIndex DataFrame (Measurement, Feature levels)
  9. Optionally extract hourly timeseries data
  10. Output Creation: Build GloBIOutputSpec with results and metadata

Output: GloBIOutputSpec with DataFrames and hourly data references


Results Aggregation#

The Scythe framework handles result consolidation:

  1. Collection: Gather GloBIOutputSpec objects from all workers
  2. Aggregation: Concatenate DataFrames across buildings
  3. Versioning: Apply semantic version to experiment results
  4. Storage: Write aggregated Parquet files to S3

Output: Versioned experiment results in cloud storage


Output Layer#

Results are delivered to users via:

  1. Download: Retrieve Parquet files from S3 to local directory
  2. Results.parquet: Monthly energy and peak data
  3. HourlyData.parquet: Optional hourly timeseries

  4. Visualization: Generate interactive D3 dashboards

  5. Summary statistics (mean, min, max)
  6. Energy use intensity (EUI) distributions
  7. Peak demand analysis

  8. CSV Export: Convert Parquet to CSV for external analysis tools


External Dependencies#

EnergyPlus#

Building energy simulation engine that performs physics-based thermal calculations

EPInterface#

Python library for generating EnergyPlus IDF (Input Data File) models programmatically

Archetypal#

Provides building archetype templates and simplified building energy modeling (SBEM)

Scythe#

Distributed computing framework for experiment allocation, result aggregation, and storage

Prisma#

Database ORM for accessing component database during model construction


Key Design Principles#

  1. Separation of Concerns: Clear boundaries between GIS processing, allocation, simulation, and results
  2. Scalability: Horizontal scaling via distributed workers and cloud storage
  3. Reproducibility: Version-controlled experiments with full provenance tracking
  4. Flexibility: Configurable preprocessing, semantic mappings, and simulation parameters
  5. Fault Tolerance: Retry logic and error handling throughout the pipeline
  6. Data Efficiency: Parquet format for compressed, columnar data storage
  7. Modularity: Independent components can be tested and deployed separately

Data Flow Summary#

User Manifest
CLI loads configuration
GIS preprocessing enriches building data
Allocation creates building specs
Hatchet distributes specs to workers
Workers run EnergyPlus simulations
Results aggregated and stored in S3
CLI downloads and visualizes results
User analyzes building stock performance

This architecture enables regional-scale building energy modeling with minimal manual intervention, supporting urban planning, policy analysis, and decarbonization strategies.