☁️ Cloud Run · Intermediate

Deploy a FastAPI API to Cloud Run in 10 minutes

⏱ 20 minutes☁️ Google Cloud⚡ FastAPI🐳 Docker

Cloud Run is Google Cloud's serverless service for containers. You deploy a Docker image and Google handles everything: automatic scaling from 0 to N, HTTPS, load balancing. You only pay for requests processed.

Prerequisites

📖 Term: Key Concepts

Cloud Run: Google Cloud's serverless service that runs Docker containers. You don't manage infrastructure — Google manages scaling, availability, and SSL certificates automatically.

Serverless: Architecture where you write code that executes in response to events, without managing servers. You only pay for actual execution time.

Container: Isolated package containing your application, its dependencies, and configuration — always the same regardless of runtime environment.

Artifact Registry: Google Cloud's managed Docker registry for storing your private Docker images securely.

Scale-to-zero: Ability to reduce infrastructure to zero when there are no requests, saving costs (you pay nothing when your API is idle).

Region: Geographic zone where your service runs (example: europe-west1 = Paris/Belgium). Closer = lower latency.

1. Configure Google Cloud

The gcloud commands configure your Google Cloud environment and enable necessary services:

Terminal
# Log in to Google Cloud
gcloud auth login

# Create a new project (or use an existing one)
gcloud projects create mon-api-project --name="My FastAPI"

# Set the project as default
gcloud config set project mon-api-project

# Enable necessary APIs
gcloud services enable run.googleapis.com \
  artifactregistry.googleapis.com \
  cloudbuild.googleapis.com

# Configure the region
gcloud config set run/region europe-west1  # Paris/Belgium
gcloud auth login: Authenticates you with your Google account and generates an access token.
gcloud projects create: Creates a new isolated project (logical container for all your services).
gcloud services enable: Activates Google Cloud APIs that your pipeline will use (Cloud Run for deployment, Artifact Registry for storing Docker images, Cloud Build for compilation).
gcloud config set: Remembers your preferences so you don't repeat them each time.

2. Prepare the FastAPI Application

Our API with a demo endpoint and support for Cloud Run environment. The important part is that the application listens on the port injected by Cloud Run via the PORT variable:

main.py
from fastapi import FastAPI
from pydantic import BaseModel
import os

app = FastAPI(
    title="My Cloud Run API",
    version="1.0.0",
    docs_url="/docs"
)

# Cloud Run automatically injects PORT
PORT = int(os.getenv("PORT", "8080"))

class PredictionRequest(BaseModel):
    text: str

class PredictionResponse(BaseModel):
    sentiment: str
    confidence: float
    text: str

@app.get("/")
def root():
    return {
        "service": "Sentiment Analysis API",
        "version": "1.0.0",
        "environment": os.getenv("ENVIRONMENT", "production")
    }

@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    """Simplified sentiment analysis for demonstration."""
    positive_words = {'excellent', 'amazing', 'great', 'perfect', 'good', 'awesome'}
    negative_words = {'bad', 'terrible', 'horrible', 'awful', 'disappointing'}

    words = set(request.text.lower().split())
    pos_count = len(words & positive_words)
    neg_count = len(words & negative_words)

    if pos_count > neg_count:
        return PredictionResponse(
            sentiment="positive",
            confidence=round(0.6 + pos_count * 0.1, 2),
            text=request.text
        )
    elif neg_count > pos_count:
        return PredictionResponse(
            sentiment="negative",
            confidence=round(0.6 + neg_count * 0.1, 2),
            text=request.text
        )
    return PredictionResponse(sentiment="neutral", confidence=0.55, text=request.text)

@app.get("/health")
def health():
    return {"status": "healthy"}
This code defines a simple FastAPI with three endpoints. The line PORT = int(os.getenv("PORT", "8080")) is crucial: it retrieves the port injected by Cloud Run. Cloud Run dynamically assigns a port (environment variable PORT) — you can't hardcode 8080 because Cloud Run decides the actual port. If the variable doesn't exist (local mode), we default to 8080.
Cloud Run may run multiple instances of your container on the same physical server — each needs a unique port. Cloud Run manages this by injecting a unique PORT variable to each container. If you hardcode 8080, additional containers will fail (port already in use).
requirements.txt
fastapi==0.111.0
uvicorn[standard]==0.29.0
pydantic==2.7.0
fastapi: Modern, fast framework for creating REST APIs.
uvicorn: ASGI server (standard interface for async Python applications) that runs FastAPI.
pydantic: Data validation and JSON serialization — automatically handles request and response parsing.

3. Optimized Dockerfile for Cloud Run

Dockerfile
FROM python:3.11-slim

WORKDIR /app

ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# Cloud Run uses the automatically injected PORT variable
# Use sh -c to allow variable substitution
CMD exec uvicorn main:app --host 0.0.0.0 --port $PORT
FROM python:3.11-slim: Minimal base image with Python 3.11 (slim = fewer unnecessary dependencies, lighter image).
PYTHONDONTWRITEBYTECODE=1: Prevents Python from generating .pyc files (size optimization).
PYTHONUNBUFFERED=1: Immediately displays logs instead of buffering them (useful to see production errors).
COPY requirements.txt . / RUN pip install: Installs dependencies before copying code (optimizes Docker cache — code changes often, dependencies rarely).
CMD exec uvicorn ... --host 0.0.0.0: Runs the server on all network interfaces (not just localhost), essential for Cloud Run to access the API. --port $PORT: Uses the variable injected by Cloud Run.
Cloud Run automatically injects the $PORT variable into your container. Your application MUST listen on this port.

4. Create a Registry and Push the Image

A registry is a centralized repository that stores your Docker images. Cloud Run must access your image to deploy it.

Terminal
# Variables
PROJECT_ID="mon-api-project"
REGION="europe-west1"
REPO_NAME="mon-repo"
IMAGE_NAME="sentiment-api"

# Create the Artifact Registry
gcloud artifacts repositories create $REPO_NAME \
  --repository-format=docker \
  --location=$REGION \
  --description="Docker images for my APIs"

# Configure Docker to authenticate to Artifact Registry
gcloud auth configure-docker $REGION-docker.pkg.dev

# Build the image with the full Artifact Registry tag
IMAGE_TAG=$REGION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$IMAGE_NAME:v1

docker build -t $IMAGE_TAG .

# Push the image
docker push $IMAGE_TAG

# Verify the image is in the registry
gcloud artifacts docker images list $REGION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME
gcloud artifacts repositories create: Creates a new private repository for storing Docker images.
gcloud auth configure-docker: Configures your local Docker client to access the registry with your Google Cloud credentials (no manual docker login needed).
docker build -t $IMAGE_TAG .: Compiles the Dockerfile and tags the image with a path including the full registry address (format: region-docker.pkg.dev/project/repo/image:tag).
docker push: Sends the compiled image to Artifact Registry (becomes accessible to Cloud Run).
gcloud artifacts docker images list: Verifies the image was pushed.

5. Deploy to Cloud Run

This command deploys your Docker image to Cloud Run with resource and access configurations:

Terminal — Deploy
gcloud run deploy sentiment-api \
  --image=$IMAGE_TAG \
  --platform=managed \
  --region=$REGION \
  --allow-unauthenticated \        # Public API
  --min-instances=0 \              # Scale to zero (economical)
  --max-instances=10 \             # Scaling limit
  --memory=512Mi \
  --cpu=1 \
  --concurrency=80 \              # Concurrent requests per instance
  --set-env-vars="ENVIRONMENT=production" \
  --port=8080
gcloud run deploy sentiment-api: Creates or updates a Cloud Run service named "sentiment-api".
--image=$IMAGE_TAG: Which Docker image to use (the one we pushed).
--platform=managed: Cloud Run fully manages infrastructure (no Kubernetes).
--allow-unauthenticated: Anyone can call the API without authentication (remove if you need security).
--min-instances=0: Scale-to-zero — when nobody uses the API, no instances are active (you pay nothing).
--max-instances=10: Maximum instance limit to protect your bill.
--memory=512Mi / --cpu=1: Resources per instance (512 MB RAM, 1 vCPU — adjust as needed).
--concurrency=80: How many concurrent requests one instance can handle before creating a new one.
--port=8080: Port expected by Cloud Run (our app actually listens on the port injected by $PORT, here 8080 by default).
With scale-to-zero, if your API has no requests (e.g., internal test API), you pay absolutely nothing. As soon as a request arrives, an instance starts in ~100ms.
Cloud Run returns an HTTPS URL: https://sentiment-api-xxx-ew.a.run.app
Your API is immediately accessible with a valid SSL certificate.
Terminal — Test
SERVICE_URL=$(gcloud run services describe sentiment-api \
  --region=$REGION \
  --format='value(status.url)')

# Test root endpoint
curl $SERVICE_URL

# Test prediction
curl -X POST $SERVICE_URL/predict \
  -H "Content-Type: application/json" \
  -d '{"text": "This product is excellent and very efficient"}'
# {"sentiment":"positive","confidence":0.8,"text":"This product is excellent and very efficient"}
gcloud run services describe ... --format='value(status.url)': Retrieves the public URL of the deployed service.
curl: Tests the API by making HTTP requests. The first test calls the root endpoint ("/"), the second tests prediction with JSON.

6. Quick Option: Deploy from Source

Instead of building and pushing manually, Cloud Run can compile your code directly via Cloud Build:

Terminal — Deploy from Source
# From the folder containing your Dockerfile
# Cloud Build builds the image and deploys automatically
gcloud run deploy sentiment-api \
  --source . \
  --region=europe-west1 \
  --allow-unauthenticated
--source .: Instead of providing a pre-built image, tell Cloud Run: "Take my source code, build the Dockerfile, push the image and deploy". Useful for quick deployments, less optimal for CI/CD (each deployment rebuilds the image).

7. Environment Variables and Secrets

For sensitive data (API keys, tokens), use Secret Manager instead of hardcoding them:

Terminal — With Secret Manager
# Create a secret in Secret Manager
echo -n "my-secret-api-key" | gcloud secrets create API_KEY \
  --data-file=- \
  --replication-policy=automatic

# Deploy with the secret mounted as an environment variable
gcloud run deploy sentiment-api \
  --image=$IMAGE_TAG \
  --region=$REGION \
  --set-secrets="API_KEY=API_KEY:latest" \
  --set-env-vars="ENVIRONMENT=production"
gcloud secrets create API_KEY: Creates an encrypted secret in Google Cloud Secret Manager (secure storage, access auditing).
--set-secrets="API_KEY=API_KEY:latest": Injects the secret as an environment variable API_KEY in your container (encrypted until runtime). Only your code has access to its true value.

Cloud Run vs Kubernetes vs VPS: When to Choose What?

Three options for deploying an application — each with its own use cases:

Comparison of Three Approaches
# Cloud Run (serverless)
✅ HTTP stateless APIs and microservices
✅ Variable / unpredictable traffic
✅ Fast startup without infrastructure management
✅ Scale to zero = economical for small loads
✅ Billed per request (no fixed cost)
❌ No long-running WebSocket connections
❌ Limited to 60 min per request
🎯 IDEAL FOR: REST APIs, webhooks, small apps with traffic spikes

# Kubernetes (orchestration)
✅ Stateful applications (databases, caches)
✅ Complex workloads (jobs, workers, CronJobs)
✅ Full control over networking, storage
✅ Multi-cloud / on-premise
✅ Suited for mature DevOps teams
❌ High administration complexity
❌ Always-on nodes = fixed cost
🎯 IDEAL FOR: Complex applications, critical services, constant traffic

# VPS / Dedicated VM
✅ Full control (SSH, sudo, install whatever)
✅ No execution time constraints
✅ Cheaper for predictable constant traffic
❌ You manage OS, updates, security, manual scaling
❌ Always active = fixed cost even idle
🎯 IDEAL FOR: Legacy applications, need for low-level control, minimal teams
Cloud Run = serverless: You don't think about infrastructure, just your code. Google handles the rest. Perfect for APIs and microservices.
Kubernetes = orchestration: More control, more complexity. You manage deployments, replicas, networking. For teams with dedicated DevOps.
VPS = traditional: A simple virtual machine. You have SSH and can install/configure what you want. Minimal but not automated.