GloBI Architecture - Detailed System Design#
This diagram provides a comprehensive view of the GloBI system architecture, including all major components, data flows, and external dependencies.
%%{init: {'theme':'base', 'themeVariables': { 'primaryColor':'#4a9eff','primaryTextColor':'#000','primaryBorderColor':'#2563eb','lineColor':'#64748b','secondaryColor':'#fbbf24','tertiaryColor':'#34d399','noteBkgColor':'#fef3c7','noteTextColor':'#000','noteBorderColor':'#f59e0b'}}}%%
flowchart TD
%% User Inputs
subgraph INPUTS["System Inputs"]
M[Manifest File<br/>GloBIExperimentSpec]
GIS[GIS Building Data<br/>Shapefile/GeoJSON/GPKG]
CDB[Component Database<br/>SQLite via Prisma]
SF[Semantic Fields<br/>YAML]
CM[Component Map<br/>YAML]
EPW[Weather Files<br/>EPW Archive]
end
%% CLI Layer
subgraph CLI["CLI Layer (main.py)"]
CLI1[submit manifest]
CLI2[simulate]
CLI3[get experiment]
CLI4[ui]
CLI5[output_viz]
end
%% Configuration Layer
subgraph CONFIG["Configuration Layer (models/configs.py)"]
EXP[GloBIExperimentSpec]
FC[FileConfig]
GPC[GISPreprocessorConfig]
HDC[HourlyDataConfig]
end
%% GIS Processing
subgraph GISPROCESS["GIS Preprocessing (pipelines.preprocess_gis_file)"]
direction TB
GP1[Load GIS File<br/>GeoPandas]
GP2[Validate & Reproject CRS]
GP3[Rename Columns<br/>Handle Shapefile Limits]
GP4[Validate Semantic Fields]
GP5[Generate/Validate IDs]
GP6[Extract Coordinates]
GP7[Filter by Height/Floors]
GP8[Filter by WWR]
GP9[Filter by Basement/Attic]
GP10[Validate Geometry]
GP11[Create Rotated Rectangles<br/>gis/geometry.py]
GP12[Filter by Area & Edge Length]
GP13[Compute Neighbor Indices]
GP14[Extract Neighbor Geometries]
GP15[Inject Semantic Context]
GP16[Assign Weather Files<br/>gis/weather.py]
GP1 --> GP2 --> GP3 --> GP4 --> GP5
GP5 --> GP6 --> GP7 --> GP8 --> GP9
GP9 --> GP10 --> GP11 --> GP12 --> GP13
GP13 --> GP14 --> GP15 --> GP16
end
%% Allocation
subgraph ALLOC["Allocation Layer (allocate.py)"]
direction TB
A1[For Each Building Row:<br/>Create GloBIBuildingSpec]
A2[Calculate Branching Factor<br/>Based on Payload Size]
A3[Create BaseExperiment<br/>with simulate_globi_building]
A4[Configure RecursionMap<br/>Distribution Strategy]
A5[Submit to Hatchet<br/>experiment.allocate]
end
%% Distributed Computing
subgraph DIST["Distributed Computing Infrastructure"]
direction TB
H[Hatchet Workflow<br/>Orchestrator]
W1[Docker Worker 1]
W2[Docker Worker 2]
WN[Docker Worker N]
S3[S3/Cloud Storage]
H --> W1
H --> W2
H --> WN
end
%% Simulation
subgraph SIM["Energy Simulation (pipelines.simulate_globi_building)"]
direction TB
S1[Receive GloBIBuildingSpec]
S2[Construct Zone Definition<br/>from Semantic Fields]
S3[Build EnergyPlus Model<br/>epinterface.sbem]
S4[Validate Conditioned Areas]
S5[Run EnergyPlus Simulation<br/>model.run]
S6[Extract Results from SQL<br/>Monthly Energy & Peak]
S7[Extract Hourly Data<br/>Optional Timeseries]
S8[Create GloBIOutputSpec<br/>DataFrames + Metadata]
S1 --> S2 --> S3 --> S4 --> S5
S5 --> S6 --> S7 --> S8
end
%% Results Aggregation
subgraph AGG["Results Aggregation (Scythe Framework)"]
direction TB
R1[Collect Outputs from Workers]
R2[Aggregate DataFrames<br/>Results + HourlyData]
R3[Apply Semantic Versioning]
R4[Store in Cloud<br/>Parquet Format]
end
%% Output
subgraph OUTPUT["Output Layer"]
direction TB
O1[Download from S3<br/>to Local Directory]
O2[Results.parquet<br/>Monthly Energy Data]
O3[HourlyData.parquet<br/>Timeseries Optional]
O4[Generate Visualization<br/>D3 Dashboard]
O5[CSV Exports]
end
%% External Dependencies
subgraph EXT["External Dependencies"]
EPL[EnergyPlus<br/>Simulation Engine]
EPI[EPInterface<br/>IDF Generation]
ARC[Archetypal<br/>Building Templates]
SCY[Scythe<br/>Distributed Framework]
PRIS[Prisma<br/>Database ORM]
end
%% Data Flow Connections
M --> CLI1
GIS --> CLI1
CDB --> CLI1
SF --> CLI1
CM --> CLI1
EPW --> CLI1
CLI1 --> EXP
EXP --> FC
EXP --> GPC
EXP --> HDC
FC --> GISPROCESS
GPC --> GISPROCESS
GIS --> GP1
SF --> GP4
EPW --> GP16
GP16 --> A1
A1 --> A2
A2 --> A3
A3 --> A4
A4 --> A5
A5 --> H
W1 --> S1
W2 --> S1
WN --> S1
CDB --> S3
S8 --> R1
R1 --> R2
R2 --> R3
R3 --> R4
R4 --> S3
CLI3 --> O1
S3 --> O1
O1 --> O2
O1 --> O3
O2 --> O4
O2 --> O5
CLI5 --> O4
%% External dependency connections
S5 -.uses.-> EPL
S3 -.uses.-> EPI
S3 -.uses.-> ARC
A5 -.uses.-> SCY
S3 -.uses.-> PRIS
%% Styling - using medium-toned colors for better contrast
style INPUTS fill:#60a5fa,stroke:#2563eb,stroke-width:3px,color:#000
style CLI fill:#d1d5db,stroke:#6b7280,stroke-width:3px,color:#000
style CONFIG fill:#fcd34d,stroke:#f59e0b,stroke-width:3px,color:#000
style GISPROCESS fill:#fcd34d,stroke:#f59e0b,stroke-width:3px,color:#000
style ALLOC fill:#fca5a5,stroke:#dc2626,stroke-width:3px,color:#000
style DIST fill:#fca5a5,stroke:#dc2626,stroke-width:3px,color:#000
style SIM fill:#fca5a5,stroke:#dc2626,stroke-width:3px,color:#000
style AGG fill:#4ade80,stroke:#16a34a,stroke-width:3px,color:#000
style OUTPUT fill:#4ade80,stroke:#16a34a,stroke-width:3px,color:#000
style EXT fill:#d8b4fe,stroke:#9333ea,stroke-width:3px,color:#000
Component Details#
System Inputs#
Manifest File (GloBIExperimentSpec)#
- Experiment name and scenario identifier
- File paths configuration
- GIS preprocessor parameters (thresholds, defaults, CRS)
- Hourly data extraction settings (optional)
GIS Building Data#
- Building footprints as polygons (Shapefile/GeoJSON/GeoPackage)
- Properties: height, number of floors, typology, age, region
- Coordinate reference system (CRS) information
Component Database#
- SQLite database accessed via Prisma ORM
- Building components: walls, windows, roofs, floors
- Material properties and thermal characteristics
- Accessed during simulation to construct energy models
Semantic Fields & Component Map#
- YAML files defining categorical building attributes
- Maps building typologies to component selections
- Examples: residential/commercial, construction era, climate zone
Weather Data#
- EPW (EnergyPlus Weather) files or archive
- Can be queried dynamically based on building location
- Provides hourly climate data for simulation
CLI Layer#
The command-line interface provides user-facing commands:
submit manifest: Load experiment configuration and initiate preprocessing/allocationsimulate: Run single building simulation (testing/debugging)get experiment: Retrieve results from cloud storageui: Launch Streamlit web interface for interactive explorationoutput_viz: Generate D3 visualization dashboards from results
Configuration Layer#
GloBIExperimentSpec#
- Top-level experiment configuration
- Links to FileConfig, GISPreprocessorConfig, HourlyDataConfig
- Supports manifest loading from YAML files
FileConfig#
- Paths to all required input files
- File validation and existence checks
GISPreprocessorConfig#
- Geometric filtering thresholds (min/max area, edge length)
- Default values (height, WWR, basement, attic)
- CRS projection settings
- Weather query parameters
HourlyDataConfig#
- Variables to extract from EnergyPlus SQL output
- Enables optional hourly timeseries capture
GIS Preprocessing Pipeline#
The preprocessing pipeline transforms raw GIS data into simulation-ready building specifications:
- Load & Validate: Read GIS file into GeoDataFrame, validate schema
- Reproject: Convert to Cartesian CRS for geometric operations
- Column Mapping: Handle Shapefile 10-character column name limits
- Semantic Validation: Ensure semantic fields exist in GIS data
- ID Handling: Generate UUIDs for buildings without IDs
- Coordinate Extraction: Extract latitude/longitude for weather queries
- Property Filters: Filter by height, floors, WWR, basement, attic
- Geometry Processing:
- Remove invalid geometries (non-polygons, self-intersections)
- Convert to rotated rectangles (
gis/geometry.py) - Filter by minimum/maximum building area
- Filter by minimum/maximum edge length
- Neighbor Analysis: Identify adjacent buildings for shading calculations
- Semantic Context: Inject building typology, age, region metadata
- Weather Assignment: Match buildings to EPW files by location
Output: Clean GeoDataFrame with enriched building data and column mappings
Allocation Layer#
Prepares building specifications for distributed execution:
- Spec Generation: Create
GloBIBuildingSpecfor each building row - Extract geometry (rotated rectangle, neighbors)
- Extract properties (height, floors, WWR, basement, attic)
-
Link semantic context and weather file
-
Branching Factor Calculation:
- Sample 1000 random specs
- Measure average JSON payload size
- Calculate:
sims_per_branch = 3MB / avg_size -
Determine:
branches_required = total_specs / sims_per_branch -
Job Submission:
- Create
BaseExperimentwithsimulate_globi_buildingfunction - Configure
RecursionMapfor distribution strategy - Submit to Hatchet with S3 client for result storage
Output: Hatchet job reference and run metadata
Distributed Computing Infrastructure#
Hatchet Workflow Orchestrator#
- Receives job submissions from allocation layer
- Manages task queue and worker assignment
- Handles retries and error recovery
- Tracks job progress and completion status
Docker Workers#
- Containerized execution environments
- Pre-configured with EnergyPlus, EPInterface, dependencies
- Scale horizontally based on workload
- Stream results to cloud storage via Scythe framework
S3/Cloud Storage#
- Stores simulation results as Parquet files
- Versions experiments using semantic versioning
- Provides durable storage for large-scale experiments
Energy Simulation Pipeline#
Each worker executes the following for assigned buildings:
- Receive Spec: Deserialize
GloBIBuildingSpecfrom JSON - Zone Definition: Construct building zones from semantic fields and component map
- Model Construction: Use EPInterface/Archetypal to build EnergyPlus IDF
- Validation: Check conditioned floor areas match geometry
- Simulation: Run EnergyPlus simulation via
model.run() - Results Extraction:
- Query SQL output for monthly energy and peak results
- Create MultiIndex DataFrame (Measurement, Feature levels)
- Optionally extract hourly timeseries data
- Output Creation: Build
GloBIOutputSpecwith results and metadata
Output: GloBIOutputSpec with DataFrames and hourly data references
Results Aggregation#
The Scythe framework handles result consolidation:
- Collection: Gather
GloBIOutputSpecobjects from all workers - Aggregation: Concatenate DataFrames across buildings
- Versioning: Apply semantic version to experiment results
- Storage: Write aggregated Parquet files to S3
Output: Versioned experiment results in cloud storage
Output Layer#
Results are delivered to users via:
- Download: Retrieve Parquet files from S3 to local directory
Results.parquet: Monthly energy and peak data-
HourlyData.parquet: Optional hourly timeseries -
Visualization: Generate interactive D3 dashboards
- Summary statistics (mean, min, max)
- Energy use intensity (EUI) distributions
-
Peak demand analysis
-
CSV Export: Convert Parquet to CSV for external analysis tools
External Dependencies#
EnergyPlus#
Building energy simulation engine that performs physics-based thermal calculations
EPInterface#
Python library for generating EnergyPlus IDF (Input Data File) models programmatically
Archetypal#
Provides building archetype templates and simplified building energy modeling (SBEM)
Scythe#
Distributed computing framework for experiment allocation, result aggregation, and storage
Prisma#
Database ORM for accessing component database during model construction
Key Design Principles#
- Separation of Concerns: Clear boundaries between GIS processing, allocation, simulation, and results
- Scalability: Horizontal scaling via distributed workers and cloud storage
- Reproducibility: Version-controlled experiments with full provenance tracking
- Flexibility: Configurable preprocessing, semantic mappings, and simulation parameters
- Fault Tolerance: Retry logic and error handling throughout the pipeline
- Data Efficiency: Parquet format for compressed, columnar data storage
- Modularity: Independent components can be tested and deployed separately
Data Flow Summary#
User Manifest
↓
CLI loads configuration
↓
GIS preprocessing enriches building data
↓
Allocation creates building specs
↓
Hatchet distributes specs to workers
↓
Workers run EnergyPlus simulations
↓
Results aggregated and stored in S3
↓
CLI downloads and visualizes results
↓
User analyzes building stock performance
This architecture enables regional-scale building energy modeling with minimal manual intervention, supporting urban planning, policy analysis, and decarbonization strategies.