Collegium:Terra: Difference between revisions
Jump to navigation
Jump to search
AdminIsidore (talk | contribs) |
AdminIsidore (talk | contribs) No edit summary |
||
Line 1: | Line 1: | ||
{{DISPLAYTITLE:Imperium System Architecture}} | {{DISPLAYTITLE:Imperium System Architecture}} | ||
{{italic title}} | {{italic title}} | ||
'''Imperium''' is a distributed, multi-node data processing pipeline designed to automate the collection, processing, and publication of data. The system is composed of several specialized hardware nodes, each with a distinct role, orchestrated to work in concert. This document outlines the foundational architecture as of September | '''Imperium''' is a distributed, multi-node data processing pipeline designed to automate the collection, processing, and publication of data. The system is composed of several specialized hardware nodes, each with a distinct role, orchestrated to work in concert. This document outlines the foundational architecture as of September 23, 2025. | ||
== Core Infrastructure Nodes == | == Core Infrastructure Nodes == | ||
The Imperium pipeline is built upon | The Imperium pipeline is built upon five primary local and cloud-based servers, each with a unique specialization. | ||
=== [[Horreum]] === | === [[Horreum]] === | ||
Serves as the primary high-performance compute node, specializing in GPU-intensive tasks. It operates as a headless server, receiving jobs from the orchestrator node, Roma. | Serves as the primary high-performance compute node, specializing in GPU-intensive tasks. It operates as a headless server, receiving jobs from the orchestrator node, Roma. | ||
* '''Role''': Headless GPU Compute Server | * '''Role''': Headless GPU Compute Server | ||
* [cite_start]'''Hardware''': HP Z620 Workstation [cite: | * [cite_start]'''Hardware''': HP Z620 Workstation [cite: 1109] | ||
* [cite_start]'''Operating System''': Ubuntu 24.04.3 LTS [cite: | * [cite_start]'''Operating System''': Ubuntu 24.04.3 LTS [cite: 1109] | ||
* [cite_start]'''CPU''': 6-Core / 12-Thread Intel Xeon E5-2630 v2 @ 2.60GHz [cite: | * [cite_start]'''CPU''': 6-Core / 12-Thread Intel Xeon E5-2630 v2 @ 2.60GHz [cite: 1113] | ||
* [cite_start]'''GPU''': NVIDIA GeForce RTX 5060 Ti with 16 GB VRAM [cite: | * [cite_start]'''GPU''': NVIDIA GeForce RTX 5060 Ti with 16 GB VRAM [cite: 1165, 1167] | ||
* [cite_start]'''Memory''': 32 GB [cite: | * [cite_start]'''Memory''': 32 GB [cite: 1110] | ||
* [cite_start]'''Storage''': 1 TB Micron SSD, configured with a 100 GB LVM partition for the OS and ~850 GB of unallocated space for data volumes. [cite: | * [cite_start]'''Storage''': 1 TB Micron SSD, configured with a 100 GB LVM partition for the OS and ~850 GB of unallocated space for data volumes. [cite: 1131, 1170] | ||
=== [[Roma]] === | === [[Roma]] === | ||
The central orchestrator of the pipeline. Roma is responsible for managing the workflow, scheduling tasks, and dispatching compute-intensive jobs to Horreum. | The central orchestrator of the pipeline. Roma is responsible for managing the workflow, scheduling tasks, and dispatching compute-intensive jobs to Horreum. | ||
* '''Role''': Orchestration & CPU Processing | * '''Role''': Orchestration & CPU Processing | ||
* [cite_start]'''Hardware''': Custom build with AMD A10-7700K APU [cite: | * [cite_start]'''Hardware''': Custom build with AMD A10-7700K APU [cite: 11] | ||
* [cite_start]'''Operating System''': Ubuntu 22.04.5 LTS [cite: | * [cite_start]'''Operating System''': Ubuntu 22.04.5 LTS [cite: 7] | ||
* [cite_start]'''CPU''': 4-Core AMD A10-7700K @ 3.40GHz [cite: | * [cite_start]'''CPU''': 4-Core AMD A10-7700K @ 3.40GHz [cite: 7, 11] | ||
* [cite_start]'''GPU''': Integrated AMD Radeon R7 Graphics [cite: | * [cite_start]'''GPU''': Integrated AMD Radeon R7 Graphics [cite: 8] | ||
* [cite_start]'''Memory''': 8 GB [cite: | * [cite_start]'''Memory''': 8 GB (6.7Gi usable) [cite: 23] | ||
* [cite_start]'''Storage''': 2 TB Hitachi Ultrastar HDD [cite: | * [cite_start]'''Storage''': 2 TB Hitachi Ultrastar HDD [cite: 31] | ||
=== [[Torta]] === | === [[Torta]] === | ||
A low-power, always-on node that serves as the central file hub for the pipeline, managing both raw and processed data. | A low-power, always-on node that serves as the bastion host and central file hub for the pipeline, managing both raw and processed data. | ||
* '''Role''': Centralized File Storage | * '''Role''': Bastion Host & Centralized File Storage | ||
* [cite_start]'''Hardware''': Raspberry Pi 4 Model B [cite: | * [cite_start]'''Hardware''': Raspberry Pi 4 Model B [cite: 767] | ||
* [cite_start]'''Operating System''': Debian GNU/Linux 12 (bookworm) [cite: | * [cite_start]'''Operating System''': Debian GNU/Linux 12 (bookworm) [cite: 766] | ||
* [cite_start]'''CPU''': 4-Core ARM Cortex-A72 @ 1.80GHz [cite: | * [cite_start]'''CPU''': 4-Core ARM Cortex-A72 @ 1.80GHz [cite: 767] | ||
* [cite_start]'''Memory''': 8 GB [cite: | * [cite_start]'''Memory''': 8 GB [cite: 767] | ||
* '''Storage''': 32 GB SD Card for OS; [cite_start]Two external HDDs (1.8 TB and 698 GB) for data storage [cite: | * '''Storage''': 32 GB SD Card for OS; [cite_start]Two external HDDs (1.8 TB and 698 GB) for data storage [cite: 782, 783] | ||
=== [[Latium]] === | === [[Latium]] === | ||
The public-facing cloud node responsible for interacting with external APIs and services. It handles the initial data collection and the final data publication. | The public-facing cloud node responsible for interacting with external APIs and services. It handles the initial data collection and the final data publication. | ||
* '''Role''': API Scraping & Data Uploading | * '''Role''': API Scraping & Data Uploading | ||
* [cite_start]'''Hardware''': DigitalOcean Droplet [cite: | * [cite_start]'''Hardware''': DigitalOcean Droplet [cite: 1865] | ||
* [cite_start]'''Operating System''': Ubuntu 22.04.5 LTS [cite: | * [cite_start]'''Operating System''': Ubuntu 22.04.5 LTS [cite: 1865] | ||
* [cite_start]'''CPU''': 1-Core DO-Regular CPU [cite: | * [cite_start]'''CPU''': 1-Core DO-Regular CPU [cite: 1866] | ||
* [cite_start]'''Memory''': 2 GB [cite: | * [cite_start]'''Memory''': 2 GB [cite: 1866] | ||
* [cite_start]'''Storage''': 50 GB SSD [cite: | * [cite_start]'''Storage''': 50 GB SSD [cite: 1889] | ||
=== [[OodaWiki]] === | === [[OodaWiki]] === | ||
A cloud-based server hosting the MediaWiki instance that serves as the final destination and presentation layer for the processed data. | A cloud-based server hosting the MediaWiki instance that serves as the final destination and presentation layer for the processed data. | ||
* '''Role''': Final Data Presentation Layer | * '''Role''': Final Data Presentation Layer | ||
* [cite_start]'''Hardware''': DigitalOcean Droplet [cite: | * [cite_start]'''Hardware''': DigitalOcean Droplet [cite: 1001] | ||
* [cite_start]'''Operating System''': Ubuntu 22.04.5 LTS [cite: | * [cite_start]'''Operating System''': Ubuntu 22.04.5 LTS [cite: 1001] | ||
* [cite_start]'''CPU''': 2-Core DO-Regular CPU [cite: | * [cite_start]'''CPU''': 2-Core DO-Regular CPU [cite: 1001] | ||
* [cite_start]'''Memory''': 4 GB [cite: | * [cite_start]'''Memory''': 4 GB [cite: 1002] | ||
* [cite_start]'''Storage''': 80 GB SSD [cite: | * [cite_start]'''Storage''': 80 GB SSD [cite: 1024] | ||
* [cite_start]'''Services''': MediaWiki running on PHP 8.1 [cite: | * [cite_start]'''Services''': MediaWiki running on PHP 8.1 [cite: 1081][cite_start], Redis [cite: 1068][cite_start], MySQL [cite: 1072][cite_start], Nginx [cite: 1077] | ||
== Network Architecture == | |||
The Imperium network is divided into a private local network and a secure cloud-to-local tunnel, establishing the "Pomerium" boundary. | |||
=== Local Network (Pomerium) === | |||
The core local servers operate on a subnet with static IP addresses assigned by the router. | |||
File sharing between these nodes will be handled by a Network File System (NFS) hosted on '''Torta'''. | |||
=== Secure VPN Tunnel (Aquaeductus) === | |||
A point-to-point WireGuard VPN provides a secure, encrypted tunnel between the public cloud and the private local network. | |||
* '''Purpose''': Allows `aqua_datum` (raw data) to be transferred securely from '''Latium''' to '''Torta'''. | |||
* '''Endpoint''': The tunnel's public endpoint is the home network's public IP, with UDP port forwarded to '''Torta'''. | |||
== Data Pipeline Workflow == | == Data Pipeline Workflow == | ||
The pipeline operates in a continuous, automated loop orchestrated primarily by '''Roma'''. | The pipeline operates in a continuous, automated loop orchestrated primarily by '''Roma'''. | ||
# '''Data Collection''': A scheduled script on '''Latium''' queries an external API. The raw data is | # '''Data Collection (Castra)''': A scheduled script or containerized agent on '''Latium''' queries an external API. The raw data (`aqua_datum`) is collected. | ||
# '''Processing Dispatch''': A script on '''Roma''' continuously monitors the raw data drive on '''Torta'''. When new | # '''Secure Transport (Aquaeductus)''': The `Salii` system on '''Latium''' transfers the `aqua_datum` through the secure WireGuard tunnel to the first external hard drive on '''Torta'''. | ||
# '''Compute & Processing''': '''Roma''' handles standard data parsing. For tasks requiring significant parallel processing, '''Roma''' dispatches the job to '''Horreum''' | # '''Processing Dispatch''': A script on '''Roma''' continuously monitors the raw data drive on '''Torta'''. When new `aqua_datum` is detected, it initiates the processing phase. | ||
# '''Data Publication''': | # '''Compute & Processing''': '''Roma''' handles standard data parsing. For tasks requiring significant parallel processing, '''Roma''' dispatches the job to '''Horreum'''. Both nodes work with data stored on '''Torta''' via NFS. Processed data (`grana_datum`) is written to the second external hard drive on '''Torta'''. | ||
# '''Data Publication''': The `Cubile` system, containing the pywikibots, runs on '''Latium''' inside the secure Pomerium zone. It accesses the `grana_datum` from '''Torta''' and uses it to update the '''OodaWiki''' server. |