Architecture¶
Understanding how Repod is structured, and why it is designed this way, helps you reason about its security guarantees, its operational boundaries, and its failure modes. This page explains the design decisions behind the three-container stack β not how to operate it, but why it works the way it does.
Overview¶
Repod is built on three principles: security-first design, separation of concerns, and minimal privilege. These are not aspirational values appended after the fact β they are constraints that shaped every architectural decision.
The separation-of-concerns principle manifests most visibly in the three-container split. The APT repository server (depot-apt) is a pure Nginx instance that serves static files. It knows nothing about users, authentication, or packages being valid. The API backend (backend-api) handles all business logic: validation, indexing, audit logging, user management. The frontend (frontend-ui) is a compiled React application delivered by its own Nginx process, with no server-side code. Each container has a single, well-defined responsibility, and none can substitute for another.
The security-first principle means that the threat model was taken seriously at design time, not retrofitted. The most visible consequence is the absence of the Docker socket. In many containerised applications, the backend mounts /var/run/docker.sock to interact with other containers β for example, to trigger a reprepro command inside the APT container. This gives the backend root-equivalent access to the host. Repod removes this vector entirely: the backend invokes the add-deb.sh shell script directly from a shared volume, and GPG operations use a shared /repos/gnupg volume. The backend never talks to the Docker daemon.
A third structural decision is that the APT repository itself is intentionally dumb. It serves dists/, pool/, and a small set of static paths. It has no application logic, no dynamic content, and no credentials. An APT client hitting port 80 receives exactly the same experience it would from a Debian mirror β because that is what depot-apt is. The intelligence lives in the backend, not in the distribution layer.
Component diagram¶
graph TD
Browser["Browser / curl"] -->|":3003 (HTTP)"| Frontend["frontend-ui\nNginx + React SPA"]
Browser -->|":80 (APT)"| AptRepo["depot-apt\nNginx β static repo"]
Frontend -->|"REST API :8000"| Backend["backend-api\nFastAPI (Python)"]
AptCli["apt / apt-get"] -->|":80"| AptRepo
Backend -->|"clamscan subprocess"| ClamAV["ClamAV\n(in-container binary)"]
Backend -->|"grype subprocess"| Grype["Grype\n(in-container binary)"]
Backend -->|"gpg subprocess"| GnupgVol[("/repos/gnupg\nShared GPG keyring")]
Backend -->|"SQLite"| AuthDB[("/repos/auth/users.db\nUser accounts")]
Backend -->|"JSON / JSONL"| ReposVol[("/repos/\nPackages, manifests,\naudit logs, index")]
AptRepo -->|"read-only"| PoolDists["pool/ + dists/\n(served over HTTP)"]
Backend -->|"read/write"| PoolDists
Backend -->|"add-deb.sh"| AptRepo
GnupgVol -.->|"shared volume"| AptRepo
style ClamAV fill:#f9f,stroke:#333
style Grype fill:#f9f,stroke:#333
style GnupgVol fill:#ffe,stroke:#999
style AuthDB fill:#ffe,stroke:#999
style ReposVol fill:#ffe,stroke:#999
ClamAV and Grype run as subprocess invocations inside the backend-api container β they are not separate containers. This is a deliberate trade-off: it simplifies the deployment (no inter-container networking for security tools) at the cost of sharing the backend's CPU and memory budget. The resource limits in docker-compose.yaml (1 GB RAM, 1.5 CPUs) reflect this.
Container breakdown¶
| Container | Image | Exposed Port | Role | Key mounts |
|---|---|---|---|---|
depot-apt |
Custom (Nginx) | :80 |
Serves the APT repository as static files | /repos/dists, /repos/pool, /repos/gnupg, /repos/logs |
backend-api |
Custom (Python 3.12 + FastAPI) | :8000 |
All business logic: upload pipeline, CVE scanning, RBAC, audit | /repos/pool, /repos/manifests, /repos/staging, /repos/audit, /repos/gnupg, /repos/auth, /repos/security, /var/lib/clamav, /repos/grype-db |
frontend-ui |
Custom (Node build + Nginx) | :3003 |
Serves the compiled React SPA | None (baked into image at build time) |
The frontend is entirely stateless at runtime. Its configuration β REACT_APP_API_URL and REACT_APP_REPO_URL β is baked in at Docker build time as environment variables. This means the frontend image is not generic; it is specific to a deployment URL. This is a standard trade-off in server-side-rendered-free React architectures.
Data flow: upload path¶
This sequence describes what happens from the moment a user or CI/CD pipeline sends a .deb to the API until the package is available to apt install.
- The client sends a
POST /upload/multipart request with the.debfile and a target distribution (e.g.jammy). A JWT or API token is required; the role must beuploader,maintainer, oradmin. - The backend writes the file to
/repos/staging/incoming/β a temporary holding area that is never served over HTTP. - The 7-step validation pipeline runs synchronously (see The security pipeline for detail). The pipeline reads the file from staging but never modifies it.
- If validation fails (format error, SHA-256 mismatch, ClamAV virus, or a blocking CVE policy): the file is moved to
/repos/staging/quarantine/, aFAILUREevent is written to the JSONL audit log, and the API returns422with the full step-by-step result. The package is unreachable via APT. - If validation passes but a CVE triggers a review policy: the file is moved to
/repos/pool/, a manifest is generated at/repos/manifests/<name>_<version>_<arch>.manifest.jsonwithstatus: pending_review, and the package is added to the central index (/repos/manifests/index.json) but not promoted into the APTdists/tree. It is stored but not installable. - If validation passes cleanly: the backend invokes
/scripts/add-deb.sh, which callsreprepro includedeb <distribution> <path>inside thedepot-aptvolume. Reprepro updatesdists/<distribution>/, regeneratesPackages.gzandRelease, and re-signsInReleaseusing the GPG key from/repos/gnupg. The manifest is updated tostatus: indexed. - The audit log records a
UPLOAD / SUCCESSevent with the SHA-256 hash, the uploader's username, and the full validation step results embedded in the entry.
Data flow: APT client path¶
This sequence describes what apt update && apt install <package> does against a Repod repository.
- The APT client reads
/etc/apt/sources.list.d/repod.list, which points tohttp://<host>:80 <distribution> main. apt updatefetches/repos/dists/<distribution>/InReleaseβ the signed index manifest β and verifies it against the trusted GPG public key that was added withapt-key addor placed in/etc/apt/trusted.gpg.d/. If the signature does not verify, APT refuses to use the repository. Thedepot-aptcontainer has no role here beyond serving the file; validation happens entirely on the client.- APT downloads the
Packages.gzfile referenced inInReleaseand parses the package list. Only packages withstatus: indexedexist in this tree; packages inpending_revieworquarantinedare invisible. apt install <package>resolves the dependency graph and downloads the.debfiles from/repos/pool/. These files are the same binaries that passed the upload pipeline β they have not been modified.- APT verifies the SHA-256 of each downloaded
.debagainst the hash recorded inPackages.gz. This is standard APT behavior; Repod does not add anything here.
Storage layout¶
All persistent state lives under a single directory tree that maps directly to Docker volumes.
/repos/
βββ pool/ # All indexed .deb binaries (served by depot-apt)
β βββ main/
β βββ <initial>/
β βββ <package>/
β βββ <name>_<version>_<arch>.deb
βββ dists/ # APT index tree β managed by reprepro
β βββ <distribution>/
β βββ main/
β βββ binary-amd64/
β β βββ Packages
β β βββ Packages.gz
β βββ ...
βββ manifests/ # One JSON manifest per package version
β βββ index.json # Central catalog (all packages, all versions)
β βββ <name>_<version>_<arch>.manifest.json
βββ audit/ # Append-only audit log, one file per day
β βββ YYYY-MM-DD.jsonl
βββ auth/ # User accounts
β βββ users.db # SQLite database (bcrypt-hashed passwords)
βββ gnupg/ # Shared GPG keyring (backend + depot-apt)
β βββ ... # Managed by gpg, not Repod directly
βββ staging/ # Transient area β never served over HTTP
β βββ incoming/ # Files arrive here before validation
β βββ quarantine/ # Failed or rejected packages
βββ imports/ # Packages fetched by the security sync job
β βββ <source-id>/
βββ security/ # CVE decisions and threat intelligence caches
β βββ decisions/ # One JSON file per package with a security decision
β βββ kev_cache.json # CISA KEV cache (TTL 24h)
β βββ epss_cache.json # EPSS scores cache (TTL 24h)
βββ grype-db/ # Grype vulnerability database cache
βββ clamav-db/ # ClamAV signature database (daily.cld, main.cvd)
βββ logs/ # Nginx access logs from depot-apt (shared volume)
β βββ access.log
βββ package-index/ # Full-text search index (if enabled)
The separation between pool/ (binaries) and manifests/ (metadata) is important. The manifest contains the entire validation history β every pipeline step result, all CVE findings, dependency analysis, and the full SHA-256/SHA-512 integrity record. This means the security posture of any package can be reconstructed from the manifests alone, without re-running the scanner.
Security boundaries¶
What depot-apt can access: The dists/ and pool/ directories (read/write, managed by reprepro), the shared GPG keyring (to sign InRelease), and the Nginx log directory. It cannot read manifests, audit logs, user accounts, staging files, or security decisions.
What backend-api can access: Everything under /repos/ except dists/ and pool/ (which it writes through the add-deb.sh script rather than directly managing the reprepro database). It can read and write all other subdirectories. Critically, it cannot access the Docker daemon.
What frontend-ui can access: Nothing on the filesystem. It is a static file server. All data is fetched from the backend API at runtime by the user's browser.
Why Docker socket removal matters: If the backend mounted /var/run/docker.sock, any path-traversal exploit, deserialization bug, or dependency vulnerability in the FastAPI process could be escalated to full host root access through the Docker API. The shared-volume approach limits the blast radius: a compromised backend can modify files it has volume access to, but it cannot launch containers, exec into other containers, or modify the host.
GPG via shared volume: The original design used a Docker socket to exec gpg commands inside the depot-apt container. The current design mounts /repos/gnupg into both containers. The backend signs packages using gpg --homedir /repos/gnupg, and depot-apt uses the same keyring for reprepro. The private key never leaves the volume; there is no API call, no network hop.
Network¶
All three containers share a single Docker bridge network (vagrant_network). No container-to-container traffic is encrypted β this is appropriate because they run on the same host and the network is not exposed externally. The only external exposure is through the published ports.
| Port | Container | Exposed to | Purpose |
|---|---|---|---|
:80 |
depot-apt |
Configurable via BIND_HOST |
APT repository (plain HTTP, signed content) |
:8000 |
backend-api |
Configurable via BIND_HOST |
REST API (browser, CI/CD, curl) |
:3003 |
frontend-ui |
Configurable via BIND_HOST |
Web UI |
BIND_HOST defaults to 0.0.0.0, which binds to all interfaces. In production behind a reverse proxy, set BIND_HOST=127.0.0.1 to prevent direct external access to these ports and let the proxy handle TLS termination.
The APT repository uses plain HTTP on purpose. APT content integrity is guaranteed by GPG signature verification, not by TLS. The InRelease file is cryptographically signed; a man-in-the-middle can observe what packages are being downloaded but cannot substitute a malicious package without the private GPG key. TLS adds confidentiality for the download list (useful in some threat models) but does not strengthen the integrity guarantee. If TLS is required for compliance, terminate it at the reverse proxy layer.
Reverse proxy placement
When placing Repod behind Nginx or Caddy, configure TRUSTED_PROXIES in backend.env to include the proxy's address range. The backend uses this list to correctly extract client IPs from X-Forwarded-For headers for rate limiting and audit logging. The default value covers 127.0.0.1 and private RFC 1918 ranges.