Collegium:Terra

From OODA WIKI
Revision as of 18:27, 19 September 2025 by AdminIsidore (talk | contribs) (Created page with "{{DISPLAYTITLE:Imperium System Architecture}} {{italic title}} '''Imperium''' is a distributed, multi-node data processing pipeline designed to automate the collection, processing, and publication of data. The system is composed of several specialized hardware nodes, each with a distinct role, orchestrated to work in concert. This document outlines the foundational architecture as of September 19, 2025. == Core Infrastructure Nodes == The Imperium pipeline is built upon...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Italic title Imperium is a distributed, multi-node data processing pipeline designed to automate the collection, processing, and publication of data. The system is composed of several specialized hardware nodes, each with a distinct role, orchestrated to work in concert. This document outlines the foundational architecture as of September 19, 2025.

Core Infrastructure Nodes

The Imperium pipeline is built upon four primary local and cloud-based servers, each with a unique specialization.

Horreum

Serves as the primary high-performance compute node, specializing in GPU-intensive tasks. It operates as a headless server, receiving jobs from the orchestrator node, Roma.

  • Role: Headless GPU Compute Server
  • [cite_start]Hardware: HP Z620 Workstation [cite: 794]
  • [cite_start]Operating System: Ubuntu 24.04.3 LTS [cite: 794]
  • [cite_start]CPU: 6-Core / 12-Thread Intel Xeon E5-2630 v2 @ 2.60GHz [cite: 799, 808]
  • [cite_start]GPU: NVIDIA GeForce RTX 5060 Ti with 16 GB VRAM [cite: 850, 852]
  • [cite_start]Memory: 32 GB [cite: 812]
  • [cite_start]Storage: 1 TB Micron SSD, configured with a 100 GB LVM partition for the OS and ~850 GB of unallocated space for data volumes. [cite: 816, 819]
  • [cite_start]Network: Static IP 192.168.68.200 via wired Ethernet (`enp1s0`) [cite: 844]

Roma

The central orchestrator of the pipeline. Roma is responsible for managing the workflow, scheduling tasks, and dispatching compute-intensive jobs to Horreum.

  • Role: Orchestration & CPU Processing
  • [cite_start]Hardware: Custom build with AMD A10-7700K APU [cite: 1698, 1702]
  • [cite_start]Operating System: Ubuntu 22.04.5 LTS [cite: 1698]
  • [cite_start]CPU: 4-Core AMD A10-7700K @ 3.40GHz [cite: 1698]
  • [cite_start]GPU: Integrated AMD Radeon R7 Graphics [cite: 1699]
  • [cite_start]Memory: 8 GB [cite: 1714]
  • [cite_start]Storage: 2 TB Hitachi Ultrastar HDD [cite: 1715, 1721]
  • [cite_start]Network: Static IP 192.168.68.201 via wired Ethernet (`enp1s0`) [cite: 1747]

Torta

A low-power, always-on node that serves as the central file hub for the pipeline, managing both raw and processed data.

  • Role: Centralized File Storage
  • [cite_start]Hardware: Raspberry Pi 4 Model B [cite: 652]
  • [cite_start]Operating System: Debian GNU/Linux 12 (bookworm) [cite: 652]
  • [cite_start]CPU: 4-Core ARM Cortex-A72 @ 1.80GHz [cite: 652]
  • [cite_start]Memory: 8 GB [cite: 652]
  • Storage: 32 GB SD Card for OS; [cite_start]Two external HDDs (1.8 TB and 698 GB) for data storage [cite: 665, 666, 667]
  • [cite_start]Network: Static IP 192.168.68.202 via wired Ethernet (`eth0`) [cite: 670, 671]

Latium

The public-facing cloud node responsible for interacting with external APIs and services. It handles the initial data collection and the final data publication.

  • Role: API Scraping & Data Uploading
  • [cite_start]Hardware: DigitalOcean Droplet [cite: 549]
  • [cite_start]Operating System: Ubuntu 22.04.5 LTS [cite: 549]
  • [cite_start]CPU: 1-Core DO-Regular CPU [cite: 549]
  • [cite_start]Memory: 2 GB [cite: 550]
  • [cite_start]Storage: 50 GB SSD [cite: 570]
  • [cite_start]Network: Public IP 159.65.246.113 (`eth0`) [cite: 575]

OodaWiki

A cloud-based server hosting the MediaWiki instance that serves as the final destination and presentation layer for the processed data.

  • Role: Final Data Presentation Layer
  • [cite_start]Hardware: DigitalOcean Droplet [cite: 1351]
  • [cite_start]Operating System: Ubuntu 22.04.5 LTS [cite: 1351]
  • [cite_start]CPU: 2-Core DO-Regular CPU [cite: 1351]
  • [cite_start]Memory: 4 GB [cite: 1352]
  • [cite_start]Storage: 80 GB SSD [cite: 1372]
  • [cite_start]Network: Public IP 104.248.8.20 (`eth0`) [cite: 1376]
  • [cite_start]Services: MediaWiki running on PHP 8.1 [cite: 1431][cite_start], Redis [cite: 1417][cite_start], MySQL [cite: 1421][cite_start], Nginx [cite: 1427]

Data Pipeline Workflow

The pipeline operates in a continuous, automated loop orchestrated primarily by Roma.

  1. Data Collection: A scheduled script on Latium queries an external API. The raw data is pulled and transferred via SCP/SSHFS to the first external hard drive connected to Torta.
  2. Processing Dispatch: A script on Roma continuously monitors the raw data drive on Torta. When new data is detected, it initiates the processing phase.
  3. Compute & Processing: Roma handles standard data parsing. For tasks requiring significant parallel processing, Roma dispatches the job to Horreum, which leverages its RTX 5060 Ti GPU. Both nodes work with data stored on Torta. Processed data is written to the second external hard drive on Torta.
  4. Data Publication: A script on Latium monitors the processed data drive on Torta. When new processed data is available, it is pulled to Latium and then formatted and uploaded to the OodaWiki server using Pywikibot.