Skip to content

Architecture

Understanding how Repod is structured, and why it is designed this way, helps you reason about its security guarantees, its operational boundaries, and its failure modes. This page explains the design decisions behind the three-container stack β€” not how to operate it, but why it works the way it does.

Overview

Repod is built on three principles: security-first design, separation of concerns, and minimal privilege. These are not aspirational values appended after the fact β€” they are constraints that shaped every architectural decision.

The separation-of-concerns principle manifests most visibly in the three-container split. The APT repository server (depot-apt) is a pure Nginx instance that serves static files. It knows nothing about users, authentication, or packages being valid. The API backend (backend-api) handles all business logic: validation, indexing, audit logging, user management. The frontend (frontend-ui) is a compiled React application delivered by its own Nginx process, with no server-side code. Each container has a single, well-defined responsibility, and none can substitute for another.

The security-first principle means that the threat model was taken seriously at design time, not retrofitted. The most visible consequence is the absence of the Docker socket. In many containerised applications, the backend mounts /var/run/docker.sock to interact with other containers β€” for example, to trigger a reprepro command inside the APT container. This gives the backend root-equivalent access to the host. Repod removes this vector entirely: the backend invokes the add-deb.sh shell script directly from a shared volume, and GPG operations use a shared /repos/gnupg volume. The backend never talks to the Docker daemon.

A third structural decision is that the APT repository itself is intentionally dumb. It serves dists/, pool/, and a small set of static paths. It has no application logic, no dynamic content, and no credentials. An APT client hitting port 80 receives exactly the same experience it would from a Debian mirror β€” because that is what depot-apt is. The intelligence lives in the backend, not in the distribution layer.

Component diagram

graph TD
    Browser["Browser / curl"] -->|":3003 (HTTP)"| Frontend["frontend-ui\nNginx + React SPA"]
    Browser -->|":80 (APT)"| AptRepo["depot-apt\nNginx β€” static repo"]
    Frontend -->|"REST API :8000"| Backend["backend-api\nFastAPI (Python)"]
    AptCli["apt / apt-get"] -->|":80"| AptRepo

    Backend -->|"clamscan subprocess"| ClamAV["ClamAV\n(in-container binary)"]
    Backend -->|"grype subprocess"| Grype["Grype\n(in-container binary)"]
    Backend -->|"gpg subprocess"| GnupgVol[("/repos/gnupg\nShared GPG keyring")]
    Backend -->|"SQLite"| AuthDB[("/repos/auth/users.db\nUser accounts")]
    Backend -->|"JSON / JSONL"| ReposVol[("/repos/\nPackages, manifests,\naudit logs, index")]

    AptRepo -->|"read-only"| PoolDists["pool/ + dists/\n(served over HTTP)"]
    Backend -->|"read/write"| PoolDists
    Backend -->|"add-deb.sh"| AptRepo

    GnupgVol -.->|"shared volume"| AptRepo

    style ClamAV fill:#f9f,stroke:#333
    style Grype fill:#f9f,stroke:#333
    style GnupgVol fill:#ffe,stroke:#999
    style AuthDB fill:#ffe,stroke:#999
    style ReposVol fill:#ffe,stroke:#999

ClamAV and Grype run as subprocess invocations inside the backend-api container β€” they are not separate containers. This is a deliberate trade-off: it simplifies the deployment (no inter-container networking for security tools) at the cost of sharing the backend's CPU and memory budget. The resource limits in docker-compose.yaml (1 GB RAM, 1.5 CPUs) reflect this.

Container breakdown

Container Image Exposed Port Role Key mounts
depot-apt Custom (Nginx) :80 Serves the APT repository as static files /repos/dists, /repos/pool, /repos/gnupg, /repos/logs
backend-api Custom (Python 3.12 + FastAPI) :8000 All business logic: upload pipeline, CVE scanning, RBAC, audit /repos/pool, /repos/manifests, /repos/staging, /repos/audit, /repos/gnupg, /repos/auth, /repos/security, /var/lib/clamav, /repos/grype-db
frontend-ui Custom (Node build + Nginx) :3003 Serves the compiled React SPA None (baked into image at build time)

The frontend is entirely stateless at runtime. Its configuration β€” REACT_APP_API_URL and REACT_APP_REPO_URL β€” is baked in at Docker build time as environment variables. This means the frontend image is not generic; it is specific to a deployment URL. This is a standard trade-off in server-side-rendered-free React architectures.

Data flow: upload path

This sequence describes what happens from the moment a user or CI/CD pipeline sends a .deb to the API until the package is available to apt install.

  1. The client sends a POST /upload/ multipart request with the .deb file and a target distribution (e.g. jammy). A JWT or API token is required; the role must be uploader, maintainer, or admin.
  2. The backend writes the file to /repos/staging/incoming/ β€” a temporary holding area that is never served over HTTP.
  3. The 7-step validation pipeline runs synchronously (see The security pipeline for detail). The pipeline reads the file from staging but never modifies it.
  4. If validation fails (format error, SHA-256 mismatch, ClamAV virus, or a blocking CVE policy): the file is moved to /repos/staging/quarantine/, a FAILURE event is written to the JSONL audit log, and the API returns 422 with the full step-by-step result. The package is unreachable via APT.
  5. If validation passes but a CVE triggers a review policy: the file is moved to /repos/pool/, a manifest is generated at /repos/manifests/<name>_<version>_<arch>.manifest.json with status: pending_review, and the package is added to the central index (/repos/manifests/index.json) but not promoted into the APT dists/ tree. It is stored but not installable.
  6. If validation passes cleanly: the backend invokes /scripts/add-deb.sh, which calls reprepro includedeb <distribution> <path> inside the depot-apt volume. Reprepro updates dists/<distribution>/, regenerates Packages.gz and Release, and re-signs InRelease using the GPG key from /repos/gnupg. The manifest is updated to status: indexed.
  7. The audit log records a UPLOAD / SUCCESS event with the SHA-256 hash, the uploader's username, and the full validation step results embedded in the entry.

Data flow: APT client path

This sequence describes what apt update && apt install <package> does against a Repod repository.

  1. The APT client reads /etc/apt/sources.list.d/repod.list, which points to http://<host>:80 <distribution> main.
  2. apt update fetches /repos/dists/<distribution>/InRelease β€” the signed index manifest β€” and verifies it against the trusted GPG public key that was added with apt-key add or placed in /etc/apt/trusted.gpg.d/. If the signature does not verify, APT refuses to use the repository. The depot-apt container has no role here beyond serving the file; validation happens entirely on the client.
  3. APT downloads the Packages.gz file referenced in InRelease and parses the package list. Only packages with status: indexed exist in this tree; packages in pending_review or quarantined are invisible.
  4. apt install <package> resolves the dependency graph and downloads the .deb files from /repos/pool/. These files are the same binaries that passed the upload pipeline β€” they have not been modified.
  5. APT verifies the SHA-256 of each downloaded .deb against the hash recorded in Packages.gz. This is standard APT behavior; Repod does not add anything here.

Storage layout

All persistent state lives under a single directory tree that maps directly to Docker volumes.

/repos/
β”œβ”€β”€ pool/                   # All indexed .deb binaries (served by depot-apt)
β”‚   └── main/
β”‚       └── <initial>/
β”‚           └── <package>/
β”‚               └── <name>_<version>_<arch>.deb
β”œβ”€β”€ dists/                  # APT index tree β€” managed by reprepro
β”‚   └── <distribution>/
β”‚       └── main/
β”‚           β”œβ”€β”€ binary-amd64/
β”‚           β”‚   β”œβ”€β”€ Packages
β”‚           β”‚   └── Packages.gz
β”‚           └── ...
β”œβ”€β”€ manifests/              # One JSON manifest per package version
β”‚   β”œβ”€β”€ index.json          # Central catalog (all packages, all versions)
β”‚   └── <name>_<version>_<arch>.manifest.json
β”œβ”€β”€ audit/                  # Append-only audit log, one file per day
β”‚   └── YYYY-MM-DD.jsonl
β”œβ”€β”€ auth/                   # User accounts
β”‚   └── users.db            # SQLite database (bcrypt-hashed passwords)
β”œβ”€β”€ gnupg/                  # Shared GPG keyring (backend + depot-apt)
β”‚   └── ...                 # Managed by gpg, not Repod directly
β”œβ”€β”€ staging/                # Transient area β€” never served over HTTP
β”‚   β”œβ”€β”€ incoming/           # Files arrive here before validation
β”‚   └── quarantine/         # Failed or rejected packages
β”œβ”€β”€ imports/                # Packages fetched by the security sync job
β”‚   └── <source-id>/
β”œβ”€β”€ security/               # CVE decisions and threat intelligence caches
β”‚   β”œβ”€β”€ decisions/          # One JSON file per package with a security decision
β”‚   β”œβ”€β”€ kev_cache.json      # CISA KEV cache (TTL 24h)
β”‚   └── epss_cache.json     # EPSS scores cache (TTL 24h)
β”œβ”€β”€ grype-db/               # Grype vulnerability database cache
β”œβ”€β”€ clamav-db/              # ClamAV signature database (daily.cld, main.cvd)
β”œβ”€β”€ logs/                   # Nginx access logs from depot-apt (shared volume)
β”‚   └── access.log
└── package-index/          # Full-text search index (if enabled)

The separation between pool/ (binaries) and manifests/ (metadata) is important. The manifest contains the entire validation history β€” every pipeline step result, all CVE findings, dependency analysis, and the full SHA-256/SHA-512 integrity record. This means the security posture of any package can be reconstructed from the manifests alone, without re-running the scanner.

Security boundaries

What depot-apt can access: The dists/ and pool/ directories (read/write, managed by reprepro), the shared GPG keyring (to sign InRelease), and the Nginx log directory. It cannot read manifests, audit logs, user accounts, staging files, or security decisions.

What backend-api can access: Everything under /repos/ except dists/ and pool/ (which it writes through the add-deb.sh script rather than directly managing the reprepro database). It can read and write all other subdirectories. Critically, it cannot access the Docker daemon.

What frontend-ui can access: Nothing on the filesystem. It is a static file server. All data is fetched from the backend API at runtime by the user's browser.

Why Docker socket removal matters: If the backend mounted /var/run/docker.sock, any path-traversal exploit, deserialization bug, or dependency vulnerability in the FastAPI process could be escalated to full host root access through the Docker API. The shared-volume approach limits the blast radius: a compromised backend can modify files it has volume access to, but it cannot launch containers, exec into other containers, or modify the host.

GPG via shared volume: The original design used a Docker socket to exec gpg commands inside the depot-apt container. The current design mounts /repos/gnupg into both containers. The backend signs packages using gpg --homedir /repos/gnupg, and depot-apt uses the same keyring for reprepro. The private key never leaves the volume; there is no API call, no network hop.

Network

All three containers share a single Docker bridge network (vagrant_network). No container-to-container traffic is encrypted β€” this is appropriate because they run on the same host and the network is not exposed externally. The only external exposure is through the published ports.

Port Container Exposed to Purpose
:80 depot-apt Configurable via BIND_HOST APT repository (plain HTTP, signed content)
:8000 backend-api Configurable via BIND_HOST REST API (browser, CI/CD, curl)
:3003 frontend-ui Configurable via BIND_HOST Web UI

BIND_HOST defaults to 0.0.0.0, which binds to all interfaces. In production behind a reverse proxy, set BIND_HOST=127.0.0.1 to prevent direct external access to these ports and let the proxy handle TLS termination.

The APT repository uses plain HTTP on purpose. APT content integrity is guaranteed by GPG signature verification, not by TLS. The InRelease file is cryptographically signed; a man-in-the-middle can observe what packages are being downloaded but cannot substitute a malicious package without the private GPG key. TLS adds confidentiality for the download list (useful in some threat models) but does not strengthen the integrity guarantee. If TLS is required for compliance, terminate it at the reverse proxy layer.

Reverse proxy placement

When placing Repod behind Nginx or Caddy, configure TRUSTED_PROXIES in backend.env to include the proxy's address range. The backend uses this list to correctly extract client IPs from X-Forwarded-For headers for rate limiting and audit logging. The default value covers 127.0.0.1 and private RFC 1918 ranges.