Collegium:Terra: Difference between revisions

From OODA WIKI
Jump to navigation Jump to search
AdminIsidore (talk | contribs)
AdminIsidore (talk | contribs)
 
Line 6: Line 6:
The Imperium pipeline is built upon five primary local and cloud-based servers, each with a unique specialization.
The Imperium pipeline is built upon five primary local and cloud-based servers, each with a unique specialization.


=== [[Horreum|Collegium:Horreum]] ===
=== [[Collegium:Horreum|Horreum]] ===
Serves as the primary high-performance compute node, specializing in GPU-intensive tasks. It operates as a headless server, receiving jobs from the orchestrator node, Roma.
Serves as the primary high-performance compute node, specializing in GPU-intensive tasks. It operates as a headless server, receiving jobs from the orchestrator node, Roma.
* '''Role''': Headless GPU Compute Server
* '''Role''': Headless GPU Compute Server

Latest revision as of 15:49, 10 October 2025

Template:Italic title Imperium is a distributed, multi-node data processing pipeline designed to automate the collection, processing, and publication of data. The system is composed of several specialized hardware nodes, each with a distinct role, orchestrated to work in concert. This document outlines the foundational architecture as of September 23, 2025.

Core Infrastructure Nodes

The Imperium pipeline is built upon five primary local and cloud-based servers, each with a unique specialization.

Horreum

Serves as the primary high-performance compute node, specializing in GPU-intensive tasks. It operates as a headless server, receiving jobs from the orchestrator node, Roma.

  • Role: Headless GPU Compute Server
  • [cite_start]Hardware: HP Z620 Workstation [cite: 1109]
  • [cite_start]Operating System: Ubuntu 24.04.3 LTS [cite: 1109]
  • [cite_start]CPU: 6-Core / 12-Thread Intel Xeon E5-2630 v2 @ 2.60GHz [cite: 1113]
  • [cite_start]GPU: NVIDIA GeForce RTX 5060 Ti with 16 GB VRAM [cite: 1165, 1167]
  • [cite_start]Memory: 32 GB [cite: 1110]
  • [cite_start]Storage: 1 TB Micron SSD, configured with a 100 GB LVM partition for the OS and ~850 GB of unallocated space for data volumes. [cite: 1131, 1170]

Roma

The central orchestrator of the pipeline. Roma is responsible for managing the workflow, scheduling tasks, and dispatching compute-intensive jobs to Horreum.

  • Role: Orchestration & CPU Processing
  • [cite_start]Hardware: Custom build with AMD A10-7700K APU [cite: 11]
  • [cite_start]Operating System: Ubuntu 22.04.5 LTS [cite: 7]
  • [cite_start]CPU: 4-Core AMD A10-7700K @ 3.40GHz [cite: 7, 11]
  • [cite_start]GPU: Integrated AMD Radeon R7 Graphics [cite: 8]
  • [cite_start]Memory: 8 GB (6.7Gi usable) [cite: 23]
  • [cite_start]Storage: 2 TB Hitachi Ultrastar HDD [cite: 31]

Torta

A low-power, always-on node that serves as the bastion host and central file hub for the pipeline, managing both raw and processed data.

  • Role: Bastion Host & Centralized File Storage
  • [cite_start]Hardware: Raspberry Pi 4 Model B [cite: 767]
  • [cite_start]Operating System: Debian GNU/Linux 12 (bookworm) [cite: 766]
  • [cite_start]CPU: 4-Core ARM Cortex-A72 @ 1.80GHz [cite: 767]
  • [cite_start]Memory: 8 GB [cite: 767]
  • Storage: 32 GB SD Card for OS; [cite_start]Two external HDDs (1.8 TB and 698 GB) for data storage [cite: 782, 783]

Latium

The public-facing cloud node responsible for interacting with external APIs and services. It handles the initial data collection and the final data publication.

  • Role: API Scraping & Data Uploading
  • [cite_start]Hardware: DigitalOcean Droplet [cite: 1865]
  • [cite_start]Operating System: Ubuntu 22.04.5 LTS [cite: 1865]
  • [cite_start]CPU: 1-Core DO-Regular CPU [cite: 1866]
  • [cite_start]Memory: 2 GB [cite: 1866]
  • [cite_start]Storage: 50 GB SSD [cite: 1889]

OodaWiki

A cloud-based server hosting the MediaWiki instance that serves as the final destination and presentation layer for the processed data.

  • Role: Final Data Presentation Layer
  • [cite_start]Hardware: DigitalOcean Droplet [cite: 1001]
  • [cite_start]Operating System: Ubuntu 22.04.5 LTS [cite: 1001]
  • [cite_start]CPU: 2-Core DO-Regular CPU [cite: 1001]
  • [cite_start]Memory: 4 GB [cite: 1002]
  • [cite_start]Storage: 80 GB SSD [cite: 1024]
  • [cite_start]Services: MediaWiki running on PHP 8.1 [cite: 1081][cite_start], Redis [cite: 1068][cite_start], MySQL [cite: 1072][cite_start], Nginx [cite: 1077]

Network Architecture

The Imperium network is divided into a private local network and a secure cloud-to-local tunnel, establishing the "Pomerium" boundary.

Local Network (Pomerium)

The core local servers operate on a subnet with static IP addresses assigned by the router.

File sharing between these nodes will be handled by a Network File System (NFS) hosted on Torta.

Secure VPN Tunnel (Aquaeductus)

A point-to-point WireGuard VPN provides a secure, encrypted tunnel between the public cloud and the private local network.

  • Purpose: Allows `aqua_datum` (raw data) to be transferred securely from Latium to Torta.
  • Endpoint: The tunnel's public endpoint is the home network's public IP, with UDP port forwarded to Torta.

Data Pipeline Workflow

The pipeline operates in a continuous, automated loop orchestrated primarily by Roma.

  1. Data Collection (Castra): A scheduled script or containerized agent on Latium queries an external API. The raw data (`aqua_datum`) is collected.
  2. Secure Transport (Aquaeductus): The `Salii` system on Latium transfers the `aqua_datum` through the secure WireGuard tunnel to the first external hard drive on Torta.
  3. Processing Dispatch: A script on Roma continuously monitors the raw data drive on Torta. When new `aqua_datum` is detected, it initiates the processing phase.
  4. Compute & Processing: Roma handles standard data parsing. For tasks requiring significant parallel processing, Roma dispatches the job to Horreum. Both nodes work with data stored on Torta via NFS. Processed data (`grana_datum`) is written to the second external hard drive on Torta.
  5. Data Publication: The `Cubile` system, containing the pywikibots, runs on Latium inside the secure Pomerium zone. It accesses the `grana_datum` from Torta and uses it to update the OodaWiki server.