Workers and Agents

Workers and Agents

How scan engines are executed on distributed workers, isolated via Docker containers, and how results flow back to the platform.

1Worker Types

Managed Cloud

Platform-operated, auto-scaled GCE/GKE instances. Zero setup — workers provision on demand as scans are queued.

GA

Self-Hosted

Customer or MSP operated. Enroll via admin-generated token. Supports air-gapped and on-premise networks.

GA

Local Agent

Desktop app for scanning external drives and removable media locally. Results only — raw data never leaves the machine.

Coming Soon

2V2 Architecture

Workers execute scan engines as isolated Docker containers on the worker host. Each engine runs in its own container with read-only access to the scan target and write access to a scoped output path.

Execution flow

Control Plane                    Worker Agent
─────────────                    ────────────
createScanV2()
  └─ writes /scansV2/{id}
     └─ steps/{engine}           workerLeaseStepV2()
        status: pending    ───→    claims step (atomic txn)
                                   pulls engine Docker image
                                   mounts target → /scan/input
                                   runs: docker run <engine>
                                 workerRenewLeaseV2()
                                   extends lease + heartbeat
                                 workerCompleteStepV2()
                           ←───    writes findings + output
        status: succeeded          uploads artifacts to GCS

Data model

/scansV2/{scanId}               ← scan job document
  .orgId, .status, .engines    ← tenant scope + config
  .resolvedEngines[]           ← validated engine list
  /steps/{stepId}              ← per-engine step
    .engine, .status           ← engine ID + state
    .lease.workerId            ← claimed worker
    .lease.leaseToken          ← ownership proof
    .lease.expiresAt           ← auto-reclaim deadline
    .input.storagePath         ← GCS path to target
  /findings/{findingId}        ← engine-produced findings
    .severity, .title, .engine

/workersV2/{workerId}          ← worker registration
  .status, .capabilities      ← online/busy/offline
  .lastHeartbeatAt             ← liveness tracking
  .tokenExpiresAt              ← auth token expiry

3Worker Enrollment

Workers are enrolled by platform admins via the Settings → Agents page or the registerWorkerV2 Cloud Function.

Enrollment flow

1. Generate token

Platform admin calls registerWorkerV2 with a worker name and optional capability list. Returns a Firebase custom token with isWorker=true claim and 24-hour TTL.

2. Start worker agent

Run the worker binary with the custom token. Worker authenticates to Firebase and writes its status to /workersV2/{workerId}.

3. Poll and execute

Worker calls workerLeaseStepV2 to atomically claim pending steps. Pulls the engine Docker image, mounts scan input, and executes the scan.

4. Heartbeat and renew

During execution, worker calls workerRenewLeaseV2 every 60s to extend the lease and report progress. Expired leases are reclaimed automatically.

5. Complete and report

Worker calls workerCompleteStepV2 with findings, output artifacts, and duration. Scan auto-completes when all steps finish.

Token structure

// Firebase Custom Token claims
{
  "isWorker": true,
  "workerId": "uuid-v4",
  "workerName": "gke-scanner-01"
}

// Token TTL: 24 hours
// Rotation: call refreshWorkerTokenV2 before expiry
// Revocation: set worker status to "quarantined"

4Self-Hosted Worker Setup

Self-hosted workers run on your infrastructure — bare metal, VMs, or Kubernetes clusters.

Prerequisites

  • Docker Engine 24+ installed and running
  • Network access to Firebase APIs (firestore.googleapis.com, cloudfunctions.googleapis.com)
  • At least 4 GB RAM and 10 GB disk for engine images
  • Worker enrollment token from platform admin

Quick start

# 1. Install worker agent
npm install -g @vulnios/worker

# 2. Start with enrollment token
vulnios-worker start \
  --token <ENROLLMENT_TOKEN> \
  --capabilities clamav,yara,trivy,grype

# 3. Worker registers and starts polling
# Status visible in Settings → Agents

Kubernetes / GKE deployment

# Deploy as a Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vulnios-worker
spec:
  replicas: 3
  template:
    spec:
      containers:
      - name: worker
        image: ghcr.io/vulnios/worker:latest
        env:
        - name: WORKER_TOKEN
          valueFrom:
            secretKeyRef:
              name: vulnios-worker-secret
              key: token
        - name: WORKER_CAPABILITIES
          value: "clamav,yara,trivy,grype,syft"
        volumeMounts:
        - name: docker-sock
          mountPath: /var/run/docker.sock
      volumes:
      - name: docker-sock
        hostPath:
          path: /var/run/docker.sock

5Isolation and Security

  • Each scan engine runs in an isolated Docker container with read-only target access.
  • Workers authenticate with short-lived Firebase custom tokens (24h TTL, rotatable).
  • Lease-based ownership — only the worker holding a valid lease token can update step results.
  • Scan inputs are downloaded via signed URL — workers cannot list other org paths in GCS.
  • Quarantined workers are rejected from all API calls until admin review.
  • Engine circuit breakers automatically disable engines with high failure rates.

Data minimization by default: Workers upload scan results and selected artifacts only. Raw target exfiltration requires explicit opt-in and is never enabled by default.

6Engine Health and Monitoring

Circuit breakers

Engines that fail repeatedly are automatically circuit-opened and skipped. Circuit closes after a cooldown period when the engine recovers.

Lease expiry reclaim

If a worker crashes or loses connectivity, its lease expires and the step is automatically re-queued for another worker.

Heartbeat monitoring

Workers send heartbeats every 60 seconds via lease renewals. Workers without heartbeats within 5 minutes are marked offline.

Vuln DB caching

Grype, Trivy, and OSV databases are cached on the worker and refreshed every 24 hours, reducing latency and network cost.

Dead letter queue

Steps that exhaust all retry attempts are moved to a dead-letter queue with full diagnostic context for manual review.

7Local Agent (coming soon)

Desktop Agent

The desktop agent will enable scanning removable drives, USB sticks, and external disks locally. Only scan results are uploaded — raw disk contents never leave the machine without explicit consent.