Recommended Architecture

Use an always-on lightweight control plane + an ephemeral Vast GPU worker

Cloudflare Worker

tiny gateway at

api.example.com

Durable Object

state store

Vast API

Ephemeral Vast GPU Instance

Cloudflared Named Tunnel

sam-origin.example.com

FastAPI SAM Service

SAM 3 Runtime

Object Storage (R2 / S3)

Key Design Decisions

  • Stable public endpoint lives at Cloudflare, not on the GPU host
  • GPU worker is created on demand and destroyed after idle time
  • Named Cloudflare Tunnel provides a stable origin hostname
  • Persist media and results in object storage; keep session metadata separately
  • Design the backend adapter so you can swap Mac / MPS or Linux / CUDA later

Ask Codex To

  1. Draft the Dockerfile and FastAPI skeleton
  2. Define the Worker proxy contract and error states
  3. Create Terraform or scripts for repeatable setup
Recommended Architecture Frecce, spazio, PgUp/PgDn, Home/End, F