Cloud Run is Google Cloud's serverless service for containers. You deploy a Docker image and Google handles everything: automatic scaling from 0 to N, HTTPS, load balancing. You only pay for requests processed.
brew install google-cloud-sdk or cloud.google.com/sdkCloud Run: Google Cloud's serverless service that runs Docker containers. You don't manage infrastructure — Google manages scaling, availability, and SSL certificates automatically.
Serverless: Architecture where you write code that executes in response to events, without managing servers. You only pay for actual execution time.
Container: Isolated package containing your application, its dependencies, and configuration — always the same regardless of runtime environment.
Artifact Registry: Google Cloud's managed Docker registry for storing your private Docker images securely.
Scale-to-zero: Ability to reduce infrastructure to zero when there are no requests, saving costs (you pay nothing when your API is idle).
Region: Geographic zone where your service runs (example: europe-west1 = Paris/Belgium). Closer = lower latency.
The gcloud commands configure your Google Cloud environment and enable necessary services:
# Log in to Google Cloud
gcloud auth login
# Create a new project (or use an existing one)
gcloud projects create mon-api-project --name="My FastAPI"
# Set the project as default
gcloud config set project mon-api-project
# Enable necessary APIs
gcloud services enable run.googleapis.com \
artifactregistry.googleapis.com \
cloudbuild.googleapis.com
# Configure the region
gcloud config set run/region europe-west1 # Paris/Belgium
Our API with a demo endpoint and support for Cloud Run environment. The important part is that the application listens on the port injected by Cloud Run via the PORT variable:
from fastapi import FastAPI
from pydantic import BaseModel
import os
app = FastAPI(
title="My Cloud Run API",
version="1.0.0",
docs_url="/docs"
)
# Cloud Run automatically injects PORT
PORT = int(os.getenv("PORT", "8080"))
class PredictionRequest(BaseModel):
text: str
class PredictionResponse(BaseModel):
sentiment: str
confidence: float
text: str
@app.get("/")
def root():
return {
"service": "Sentiment Analysis API",
"version": "1.0.0",
"environment": os.getenv("ENVIRONMENT", "production")
}
@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
"""Simplified sentiment analysis for demonstration."""
positive_words = {'excellent', 'amazing', 'great', 'perfect', 'good', 'awesome'}
negative_words = {'bad', 'terrible', 'horrible', 'awful', 'disappointing'}
words = set(request.text.lower().split())
pos_count = len(words & positive_words)
neg_count = len(words & negative_words)
if pos_count > neg_count:
return PredictionResponse(
sentiment="positive",
confidence=round(0.6 + pos_count * 0.1, 2),
text=request.text
)
elif neg_count > pos_count:
return PredictionResponse(
sentiment="negative",
confidence=round(0.6 + neg_count * 0.1, 2),
text=request.text
)
return PredictionResponse(sentiment="neutral", confidence=0.55, text=request.text)
@app.get("/health")
def health():
return {"status": "healthy"}
PORT = int(os.getenv("PORT", "8080")) is crucial: it retrieves the port injected by Cloud Run. Cloud Run dynamically assigns a port (environment variable PORT) — you can't hardcode 8080 because Cloud Run decides the actual port. If the variable doesn't exist (local mode), we default to 8080.
PORT variable to each container. If you hardcode 8080, additional containers will fail (port already in use).
fastapi==0.111.0
uvicorn[standard]==0.29.0
pydantic==2.7.0
FROM python:3.11-slim
WORKDIR /app
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
# Cloud Run uses the automatically injected PORT variable
# Use sh -c to allow variable substitution
CMD exec uvicorn main:app --host 0.0.0.0 --port $PORT
$PORT variable into your container. Your application MUST listen on this port.A registry is a centralized repository that stores your Docker images. Cloud Run must access your image to deploy it.
# Variables
PROJECT_ID="mon-api-project"
REGION="europe-west1"
REPO_NAME="mon-repo"
IMAGE_NAME="sentiment-api"
# Create the Artifact Registry
gcloud artifacts repositories create $REPO_NAME \
--repository-format=docker \
--location=$REGION \
--description="Docker images for my APIs"
# Configure Docker to authenticate to Artifact Registry
gcloud auth configure-docker $REGION-docker.pkg.dev
# Build the image with the full Artifact Registry tag
IMAGE_TAG=$REGION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME/$IMAGE_NAME:v1
docker build -t $IMAGE_TAG .
# Push the image
docker push $IMAGE_TAG
# Verify the image is in the registry
gcloud artifacts docker images list $REGION-docker.pkg.dev/$PROJECT_ID/$REPO_NAME
docker login needed).region-docker.pkg.dev/project/repo/image:tag).This command deploys your Docker image to Cloud Run with resource and access configurations:
gcloud run deploy sentiment-api \
--image=$IMAGE_TAG \
--platform=managed \
--region=$REGION \
--allow-unauthenticated \ # Public API
--min-instances=0 \ # Scale to zero (economical)
--max-instances=10 \ # Scaling limit
--memory=512Mi \
--cpu=1 \
--concurrency=80 \ # Concurrent requests per instance
--set-env-vars="ENVIRONMENT=production" \
--port=8080
$PORT, here 8080 by default).
SERVICE_URL=$(gcloud run services describe sentiment-api \
--region=$REGION \
--format='value(status.url)')
# Test root endpoint
curl $SERVICE_URL
# Test prediction
curl -X POST $SERVICE_URL/predict \
-H "Content-Type: application/json" \
-d '{"text": "This product is excellent and very efficient"}'
# {"sentiment":"positive","confidence":0.8,"text":"This product is excellent and very efficient"}
Instead of building and pushing manually, Cloud Run can compile your code directly via Cloud Build:
# From the folder containing your Dockerfile
# Cloud Build builds the image and deploys automatically
gcloud run deploy sentiment-api \
--source . \
--region=europe-west1 \
--allow-unauthenticated
For sensitive data (API keys, tokens), use Secret Manager instead of hardcoding them:
# Create a secret in Secret Manager
echo -n "my-secret-api-key" | gcloud secrets create API_KEY \
--data-file=- \
--replication-policy=automatic
# Deploy with the secret mounted as an environment variable
gcloud run deploy sentiment-api \
--image=$IMAGE_TAG \
--region=$REGION \
--set-secrets="API_KEY=API_KEY:latest" \
--set-env-vars="ENVIRONMENT=production"
API_KEY in your container (encrypted until runtime). Only your code has access to its true value.
Three options for deploying an application — each with its own use cases:
# Cloud Run (serverless)
✅ HTTP stateless APIs and microservices
✅ Variable / unpredictable traffic
✅ Fast startup without infrastructure management
✅ Scale to zero = economical for small loads
✅ Billed per request (no fixed cost)
❌ No long-running WebSocket connections
❌ Limited to 60 min per request
🎯 IDEAL FOR: REST APIs, webhooks, small apps with traffic spikes
# Kubernetes (orchestration)
✅ Stateful applications (databases, caches)
✅ Complex workloads (jobs, workers, CronJobs)
✅ Full control over networking, storage
✅ Multi-cloud / on-premise
✅ Suited for mature DevOps teams
❌ High administration complexity
❌ Always-on nodes = fixed cost
🎯 IDEAL FOR: Complex applications, critical services, constant traffic
# VPS / Dedicated VM
✅ Full control (SSH, sudo, install whatever)
✅ No execution time constraints
✅ Cheaper for predictable constant traffic
❌ You manage OS, updates, security, manual scaling
❌ Always active = fixed cost even idle
🎯 IDEAL FOR: Legacy applications, need for low-level control, minimal teams