Minimal Rails Terraform Setup

Synthesizer
You

What is a minimal terraform setup for rails?

Assistant

A minimal, production-grade Terraform setup for a Ruby on Rails application is not just a single script, but a modularized infrastructure. It must securely manage three foundational layers:
1. Networking (VPC): To isolate application components.
2. Stateful Database (RDS PostgreSQL): To persist your Rails ActiveRecord data.
3. Stateless Compute (ECS or EKS/Kubernetes): To run Puma (web) and Sidekiq (background workers) containers.

By synthesizing best practices from a DevOps perspective with real-world Rails infrastructure patterns from thoughtbot (a premier Rails agency), we can define the architecture, directory layout, and key code boundaries for a minimal setup.


1. Recommended Directory Structure

According to the Terraform Tutorial for DevOps | Guide by Hostman, keeping a clear and organized folder structure is crucial for maintainability. Instead of writing a single monolithic main.tf file, you should divide your code into root modules (for environment-specific deployments) and resource modules (for reusable infrastructure components).

A standard, production-ready directory structure looks like this:

├── environments/
│   ├── production/
│   │   ├── main.tf           # Root configuration calling VPC, RDS, and Compute
│   │   ├── provider.tf       # Cloud provider definition
│   │   ├── variables.tf      # Environment-specific values (e.g., instance sizes)
│   │   └── outputs.tf
│   └── staging/
└── modules/
    ├── network/              # Shared resources (VPC, subnets, internet gateways)
    ├── database/             # Stateful RDS Postgres instance
    └── compute/              # Stateless container runner (ECS Fargate or EKS)

Each module folder should consistently contain a main.tf (core resource definitions), variables.tf (inputs and defaults), outputs.tf (to pass values to other modules), and a README.md Terraform Tutorial for DevOps | Guide by Hostman.


2. Defining the Core Components

A. Networking Module (modules/network)

Your network isolates your Rails compute from the public web.
* Public Subnets: House the Application Load Balancer (ALB) to accept public HTTPS traffic.
* Private Subnets: House the actual Rails servers and the RDS database. Traffic can only reach them via the ALB or NAT Gateway.
* Best Practice: Shared resources like networks should be kept in a dedicated common directory to avoid rebuilds during app deployments Terraform Tutorial for DevOps | Guide by Hostman.

B. Database Module (modules/database)

Rails defaults to PostgreSQL. Databases are stateful resources and require extra protection.
* Deletion Protection: Ensure deletion_protection = true is enabled in your aws_db_instance block to prevent accidental loss of data Terraform Tutorial for DevOps | Guide by Hostman.
* Encryption: Ensure that your storage and your Terraform state files are encrypted as an extra security measure Terraform Tutorial for DevOps | Guide by Hostman.
* Outputs: This module must output the database address (endpoint) and the port to be consumed by the Rails container.

C. Compute Module (modules/compute)

Your Rails container deployment (often managed via Docker and ECS or Kubernetes EKS) runs here.
* The EKS / Flightdeck Approach: In their open-source platform engineering tool Flightdeck, thoughtbot packages years of client deployment experience into specialized Terraform modules Learning Terraform - A hands-on, open source contribution.
* If utilizing EKS (Kubernetes) for your Rails compute, a minimal cluster setup requires:
* A unique cluster name (e.g., example-production-v1) Learning Terraform - A hands-on, open source contribution.
* At least one node group definition Learning Terraform - A hands-on, open source contribution.
* Thoughtbot’s default node group configuration establishes a baseline of t3.large instances with a minimum node size of 2 to ensure high availability for web requests Learning Terraform - A hands-on, open source contribution.


3. Key Tensions & Architectural Decisions

When implementing this minimal setup, you must navigate several architectural tradeoffs:

  • Stateful vs. Stateless Separation: The Terraform Tutorial for DevOps | Guide by Hostman emphasizes that critical stateful resources (databases) must be managed separately from less essential, stateless ones. In a Rails ecosystem, this means you should not put your RDS resource in the same state file or directory as your compute deployments. If a deployment fails or you need to recreate your compute cluster, your database remains safely untouched.
  • Hobbyist Costs vs. Production Resilience: There is a tension between a truly "minimal" cost setup and a "minimal" production-grade setup. A developer looking for low-cost hosting might run Rails on a single t2.micro EC2 instance. However, production-grade module frameworks like thoughtbot's Flightdeck assume a more robust baseline (e.g., dual-node t3.large EKS clusters) to support high-availability container scaling and background job queues Learning Terraform - A hands-on, open source contribution.
  • DRY Code vs. Readability: Variables help pass dependencies (like feeding VPC subnet IDs into your ECS module) Terraform Tutorial for DevOps | Guide by Hostman. However, over-parameterizing your modules can make the setup unnecessarily complex. Keep your resource modules simple and output only the values that are strictly required Terraform Tutorial for DevOps | Guide by Hostman.
You

What is the advantage of using Kamal with terraform?

Assistant

Using Kamal (the open-source, Docker-first deployment tool developed by 37signals) in tandem with Terraform creates an incredibly powerful, cost-effective, and elegant DevOps pipeline for Ruby on Rails applications.

Rather than viewing them as competing tools, think of them as having a strict, complementary division of labor: Terraform provisions the physical "hardware" (infrastructure), and Kamal deploys the "software" (containers).

Here are the primary advantages of combining Kamal with Terraform:


1. The Perfect Division of Labor (Provisioning vs. Deployment)

One of the most common anti-patterns in DevOps is forcing a tool to do something it wasn't designed for.
* The Problem with Terraform-only deploys: While you can use Terraform to spin up ECS services or Kubernetes pods, using it to push daily application updates, manage environment variables, or run Rails database migrations is slow and clunky. Terraform is designed to manage static, declarative infrastructure state.
* The Solution: You use Terraform to provision the foundational network, the virtual machines (e.g., AWS EC2, Hetzner, or DigitalOcean), and the database (RDS) Terraform Tutorial for DevOps | Guide by Hostman. You then hand off the dynamic, rapid-fire application deployment process to Kamal. Kamal specializes in building Docker images, pushing them to a registry, running database migrations, and executing zero-downtime rolling restarts.

2. Radical Cost and Complexity Reduction

In traditional enterprise setups (like thoughtbot’s Flightdeck), running Rails in production often requires AWS EKS (Kubernetes) or ECS Fargate Learning Terraform - A hands-on, open source contribution. These systems require complex cluster networking, node group scaling, and proprietary target groups, which drives up AWS bills and cognitive overhead.
* By pairing Terraform + Kamal, you can use Terraform to provision simple, cheap virtual machines (like basic EC2 instances or Hetzner VPSs) and an RDS database.
* Kamal then transforms those basic virtual machines into a highly resilient web cluster using standard Docker containers and its built-in proxy. You get the scaling benefits of containers without paying the "Kubernetes tax."

3. Agentless Simplicity

Like Ansible, which is celebrated for being an "agentless tool" that automates tasks directly over SSH to reduce human error What is Ansible? A Tool to Automate Parts of Your Job, Kamal is completely agentless.
* How they integrate: When Terraform provisions your virtual machines, it outputs their public or private IP addresses Terraform Tutorial for DevOps | Guide by Hostman.
* You feed these IPs directly into Kamal’s configuration file (deploy.yml).
* Because Kamal is agentless, you do not need to use Terraform to install heavy agents, background daemons, or cluster management software on the target servers. Kamal simply SSHs into the bare VMs provisioned by Terraform, installs Docker (if missing), and boots your Rails application.

4. Zero Cloud Lock-In

If you deploy Rails using Terraform modules highly customized for AWS (such as ECS, ALB, and CloudWatch), migrating to another cloud provider is a massive engineering undertaking.
* With Kamal + Terraform: Your infrastructure code is modularized Terraform Tutorial for DevOps | Guide by Hostman. If you want to move from AWS to Hetzner or DigitalOcean, you only need to rewrite a minimal Terraform module to spin up basic VMs on the new provider.
* Because Kamal only cares about having bare Linux servers with SSH access, your application configuration (deploy.yml) remains virtually unchanged. You achieve total cloud portability.

Summary of Workflow

Step Tool Responsibility
1. Provision Terraform Creates VPC, Security Groups, RDS Postgres, and bare EC2 VMs. Outputs the VM IP addresses.
2. Configure Kamal Reads the IPs from Terraform. SSHs into the servers, installs Docker, and boots the Kamal Proxy.
3. Deploy Kamal Builds your Rails Docker image, pushes it to your registry, runs migrations, and boots the new containers with zero-downtime.
You

In terms of networking or the puma servers, is there anything I should keep in mind?

Assistant

When combining Terraform and Kamal to deploy a Ruby on Rails application, the handshake between your cloud network (managed by Terraform) and your web server configuration (Puma + Kamal Proxy) is critical.

Mistakes here usually manifest as deployment timeouts, database connection limits being exceeded, or Puma containers crashing due to Out Of Memory (OOM) errors.

Here is what you must keep in mind regarding Networking and Puma when designing your setup.


1. Networking & Security Considerations (Terraform)

Your Terraform configuration must establish the correct routing and open the exact ports that Kamal and your containerized Puma instances require.

A. The SSL Termination Decision

You need to decide where SSL/TLS encryption will be decrypted. This determines your Terraform network design:

  • Option 1: SSL at the Load Balancer (Recommended for AWS/Production)
    • Terraform's Job: Provision an AWS Application Load Balancer (ALB) and an AWS Certificate Manager (ACM) certificate.
    • The Network Path: Traffic goes Internet (HTTPS) -> ALB -> EC2 Instances (HTTP).
    • Security Groups: Open ports 80 and 443 to the world on the ALB. On the VMs, open port 80 only to traffic coming from the ALB's security group.
  • Option 2: SSL on the VM (Recommended for simple setups or Hetzner/DigitalOcean)
    • Kamal's Job: Kamal 2 features an automatic built-in SSL provisioner (via Let's Encrypt) inside Kamal Proxy.
    • The Network Path: Traffic goes Internet (HTTPS) -> VM (Kamal Proxy) -> Puma Container.
    • Security Groups: In your Terraform code, you must open ports 80 and 443 on the VM directly to the public internet so Kamal Proxy can receive web traffic and perform Let's Encrypt ACME challenges.

B. The SSH Port (Port 22)

Because Kamal is agentless and deploys strictly over SSH What is Ansible? A Tool to Automate Parts of Your Job, your Terraform security groups must allow incoming traffic on Port 22 to your VMs.
* Best Practice: Do not leave port 22 wide open to the entire internet (0.0.0.0/0). Restrict port 22 access to your company's VPN IP, your office IP, or your CI/CD runner's IP (like GitHub Actions).

C. Database Isolation

Following the best practice of separating stateful and stateless resources Terraform Tutorial for DevOps | Guide by Hostman, your RDS PostgreSQL database should live in a private subnet.
* Security Group: The RDS instance's security group should only allow incoming traffic on port 5432 from the security group of your Puma VM instances. It should never be accessible from the public internet.


2. Puma & Container Considerations (Kamal)

Inside your Docker container, Puma acts as the application gateway. How you configure Puma must align perfectly with the virtual machine sizes you provisioned via Terraform.

A. Port Binding inside Docker

By default, Kamal Proxy routes traffic to port 3000 inside your Rails Docker container.
* The Trap: Ensure your Dockerfile or your Puma configuration (config/puma.rb) binds to 0.0.0.0 (all network interfaces) and not 127.0.0.1 (localhost). If Puma binds to 127.0.0.1, Kamal Proxy will not be able to route traffic to it from the host machine.
* Puma Config:
ruby
# config/puma.rb
port Env.fetch("PORT") { 3000 }
bind "tcp://0.0.0.0:#{ENV.fetch("PORT") { 3000 }}"

B. Sizing Puma to your Terraform VMs (The OOM Risk)

If you provision small, cheap VMs via Terraform, running Puma in its default "clustered" mode (multiple worker processes) will quickly crash your server due to Out Of Memory (OOM) errors.

  • Small VMs (t3.micro / t3.small - 1GB to 2GB RAM):
    • Set Puma to run in Single Mode (0 workers, threads only).
    • WEB_CONCURRENCY=0
    • RAILS_MAX_THREADS=5
  • Production VMs (t3.large - 2 vCPUs, 8GB RAM):

C. The Database Connection Pool Math

Every Puma worker utilizes threads to handle concurrent requests. You must ensure your Rails database connection pool size is configured correctly to prevent "ActiveRecord::ConnectionTimeoutError" crashes.

Use this mathematical formula to configure your Rails database.yml and your RDS database limit in Terraform:

$$\text{Required DB Pool Size} = (\text{Puma Workers} \times \text{Puma Threads}) + \text{Sidekiq Concurrency}$$

  • Example: If you have 2 Puma workers running 5 threads each, your database pool size must be at least 10.
  • The Terraform Connection: In your Terraform aws_db_instance configuration, ensure the PostgreSQL max_connections parameter is larger than: $$\text{Required DB Pool Size} \times \text{Number of VM Servers} + \text{buffer for migrations/consoles}$$ If you scale your VMs from 2 to 4 using Terraform, your database must be configured to handle the exponential rise in open connections.
You

How would the CI/CD work in this context? Am I running kamal locally? Can I attach it to a github action or bitbucket pipeline safely?

Assistant

Yes, you can absolutely run Kamal locally, and in fact, its "zero-dependency" local execution is one of its biggest selling points. However, for a team or a professional production environment, attaching Kamal to a CI/CD pipeline like GitHub Actions or Bitbucket Pipelines is the best practice.

Because Kamal is an agentless tool that operates entirely over SSH—highly similar to how Ansible manages server automation What is Ansible? A Tool to Automate Parts of Your Job—integrating it into a CI/CD pipeline is straightforward, but it requires specific security precautions.

Here is how the workflow works, how to configure a GitHub Actions pipeline, and how to do it safely.


1. How the CI/CD Workflow Operates

When you merge code to your main branch, the CI/CD runner (GitHub/Bitbucket) takes over the deployment. The runner acts exactly like your local computer would:

[ Git Push to Main ]
        │
        ▼
[ CI/CD Runner (GitHub/Bitbucket) ]
        │ ── (Builds Docker Image & Pushes) ──> [ Container Registry (e.g., ECR/GHCR) ]
        │
        │ ── (SSH Commands to Pull & Deploy) ──> [ Puma VMs (Provisioned by Terraform) ]
  1. Build & Push: The runner uses Docker to build your Rails production image and pushes it to your registry (like GitHub Packages, Docker Hub, or AWS ECR).
  2. Deploy: The runner SSHs into the VMs (using the IP addresses provisioned by Terraform) and instructs the Kamal daemon on those servers to pull the new image and perform a zero-downtime rolling restart.

2. A Minimal GitHub Actions Configuration

To run Kamal in GitHub Actions, you must provide the runner with Docker access, Ruby (to run the kamal gem), and your SSH private key.

Here is a minimal, working .github/workflows/deploy.yml pipeline:

name: Deploy to Production

on:
  push:
    branches: [ main ]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      # 1. Set up Ruby (needed to run Kamal commands)
      - name: Set up Ruby
        uses: ruby/setup-ruby@v1
        with:
          ruby-version: '3.3'
          bundler-cache: true

      # 2. Install Kamal
      - name: Install Kamal
        run: gem install kamal -v 2.3.0 # Match your local Kamal version

      # 3. Set up Docker Buildx (for faster builds)
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      # 4. Set up SSH Agent and load the Private Key
      # This allows the runner to log into the VMs provisioned by Terraform
      - name: Set up SSH Agent
        uses: webfactory/ssh-agent@v0.9.0
        with:
          ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}

      # 5. Log into Docker Registry
      - name: Log in to Registry
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      # 6. Deploy with Kamal
      # Environment variables are passed securely from GitHub Secrets
      - name: Deploy Application
        env:
          RAILS_MASTER_KEY: ${{ secrets.RAILS_MASTER_KEY }}
          KAMAL_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
        run: kamal deploy

3. Safety and Security Considerations

Running agentless deployment tools in a shared CI/CD environment introduces specific security vectors you must lock down.

A. Keep Your SSH Private Key Secure

Your CI/CD pipeline needs an SSH private key to log into your Puma servers.
* The Rule: Never hardcode keys or commit them to git. Store the private key in GitHub/Bitbucket Repository Secrets Terraform Tutorial for DevOps | Guide by Hostman.
* The Terraform Hook: When Terraform provisions your EC2/VM instances, it should inject the public half of this key pair into the server's ~/.ssh/authorized_keys file (often using the aws_key_pair resource in Terraform).

B. Dynamic IP Management (Terraform Outputs)

How does your CI/CD runner know what the server IPs are?
* Rather than hardcoding IPs in your config/deploy.yml, you can make Kamal dynamic by reading environment variables:
yaml
# config/deploy.yml
service: my-rails-app
servers:
web:
- <%= ENV['PRIMARY_WEB_IP'] %>
- <%= ENV['SECONDARY_WEB_IP'] %>

* In your CI/CD pipeline, you can use the Terraform CLI to fetch the latest IPs directly from your state file before running Kamal:
bash
export PRIMARY_WEB_IP=$(terraform output -raw primary_web_ip)
kamal deploy

Note: Ensure your Terraform state is stored securely in a remote backend with encryption enabled (like AWS S3) to protect these outputs Terraform Tutorial for DevOps | Guide by Hostman.

C. Network Whitelisting / The Firewalled VPC

If you have followed best-practice network design and placed your servers inside a private subnet Terraform Tutorial for DevOps | Guide by Hostman, public CI/CD runners (like GitHub's default runners) won't be able to SSH into them. You have two ways to solve this safely:

  1. Self-Hosted Runners (Most Secure): You run a lightweight GitHub/Bitbucket runner agent inside your private VPC. It can reach your Puma servers privately, meaning port 22 does not need to be exposed to the internet.
  2. OIDC & Dynamic Security Groups: Use an OpenID Connect (OIDC) provider to allow GitHub Actions to temporarily authenticate with AWS/your cloud, fetch the runner's current public IP, dynamically add that IP to the Terraform Security Group to allow port 22 access, run the deploy, and then strip the IP permission when finished.
You

Is terraform something typically used to control aws configurations? What happens when something changes manually in the website? How does terraform handle the drift?

Assistant

1. Is Terraform Typically Used to Control AWS Configurations?

Yes, absolutely. Terraform is the industry-standard tool for managing AWS configurations.

While manual configuration in the AWS Console is acceptable for "one-off environments or testing" AWS Timestream with Terraform, it is highly discouraged for production systems. For professional teams, writing declarative Terraform code is the default method for provisioning VPCs, database instances, and compute clusters because it makes infrastructure reproducible, auditable, and easily scalable.


2. What Happens When Something is Changed Manually? (Configuration Drift)

If a team member logs into the AWS Console and manually alters a setting—such as changing an EC2 instance size, deleting a security group rule, or modifying an RDS database parameter—this creates a situation known as configuration drift.

When drift occurs, your real-world infrastructure in AWS no longer matches the "desired state" written in your Terraform .tf files.


3. How Does Terraform Handle and Correct Drift?

Terraform's architecture is uniquely suited to detect and fix drift. It relies on a critical component called the State File (terraform.tfstate), which acts as a database mapping your code to real-world AWS resources Terraform Tutorial for DevOps | Guide by Hostman.

Here is the step-by-step process of how Terraform handles drift:

Step A: Detection via terraform plan

When you run terraform plan, Terraform does not just look at your local files; it performs a refresh cycle.
1. It queries the live AWS APIs to check the current, actual status of all resources listed in your state file.
2. It compares the Actual State (what is live on AWS right now) with the Desired State (what is written in your .tf code).
3. If it detects a manual change (drift), it prints out a list of discrepancies. For example:
* “AWS Security Group has an extra rule not declared in code.”
* “Database instance size is t3.large on AWS, but code says t3.medium.”

Step B: Reconciliation via terraform apply

If drift is detected, you have two paths to resolve it, depending on whether the manual change was a mistake or intentional:

  • Option 1: Revert the manual change (Undo drift) If the manual change was a mistake, you simply run terraform apply. Terraform will communicate with the AWS API to overwrite the manual console change and revert the resource back to the exact specifications written in your .tf files. (e.g., it will shrink the database back to t3.medium or delete the unauthorized security group rule).
  • Option 2: Adopt the manual change (Update your code) If the manual change was an emergency fix that you want to keep permanently, you must update your Terraform code to match the manual change. Once your code matches the live environment, running terraform plan will show No changes. Your infrastructure matches the configuration.

Important Safeguards: State Locking and Encryption

Because the state file is the source of truth for detecting drift, managing it safely is crucial:
* State Encryption: Because your state file contains sensitive metadata about your AWS configuration (and sometimes database passwords), it should always be encrypted in transit and at rest Terraform Tutorial for DevOps | Guide by Hostman.
* State Locking: If two developers (or a developer and a CI/CD pipeline) run terraform apply at the same time, they could corrupt the state file or apply conflicting changes. To prevent this, your backend configuration (such as AWS S3 with DynamoDB) should have state locking enabled to ensure only one entity can modify the infrastructure at any given moment Terraform Tutorial for DevOps | Guide by Hostman.

You

Does kamal give you the ability to store configurations?

Assistant

Yes, Kamal has a highly structured, built-in configuration system.

Instead of relying on server-side configuration panels or manual environment management, Kamal uses a declarative, code-first configuration approach. When you run kamal init, it creates two key local files in your Rails repository that govern your entire deployment Tutorial: Deploy a Rails App Using Kamal:

  1. config/deploy.yml (The blueprint: committed to Git).
  2. .kamal/secrets (The safe: never committed to Git) Deploying Web Applications with Kamal and UpCloud.

Here is how Kamal stores and resolves your configurations safely:


1. The Blueprint: config/deploy.yml

This file stores the structure of your application deployment Tutorial: Deploy a Rails App Using Kamal. It specifies:
* Service Name: The name used to isolate your Docker containers on the target machine Tutorial: Deploy a Rails App Using Kamal.
* Docker Registry: Where to push your built production images (e.g., Docker Hub, GitHub Packages) Tutorial: Deploy a Rails App Using Kamal.
* Server IPs (Roles): The target destination IPs (e.g., separating web, worker, and cron servers) Deploying Web Applications with Kamal and UpCloud.
* Volumes & Assets: Shared directories or folders you want mapped from the host to the container.

Dynamic Configurations via Ruby ERB

A major advantage of config/deploy.yml is that it is parsed with Ruby ERB (Embedded Ruby) Tutorial: Deploy a Rails App Using Kamal. This means your configuration doesn't have to be strictly hardcoded; you can pull variables dynamically:

# config/deploy.yml
service: my-rails-app
image: username/my-rails-app

# Dynamic server list loaded from local system environment variables
servers:
  web:
    - <%= ENV["PRIMARY_WEB_IP"] %>
    - <%= ENV["SECONDARY_WEB_IP"] %>

2. Handling Environment Variables (Clear vs. Secrets)

Kamal makes a strict architectural distinction between non-sensitive app configurations (like RAILS_ENV) and sensitive secrets (like DATABASE_URL or SECRET_KEY_BASE). It stores them under separate keys in deploy.yml:

# config/deploy.yml
env:
  # 1. Clear text variables (Safe to commit to Git)
  clear:
    RAILS_ENV: production
    PORT: "3000"
    DB_HOST: "my-rds-endpoint.aws.com" # Outputted by Terraform!

  # 2. Secret variables (Only keys are defined here; values are pulled securely)
  secret:
    - RAILS_MASTER_KEY
    - DATABASE_URL

3. The Safe: .kamal/secrets (or Local Environment)

To keep secrets out of Git, you store the actual sensitive values in a local secrets file. When you run kamal deploy, Kamal reads these variables locally, injects them securely over SSH, and boots the container on your servers Deploying Web Applications with Kamal and UpCloud.

In Kamal 2, this is managed via .kamal/secrets:

# .kamal/secrets (Keep this file in your .gitignore!)
RAILS_MASTER_KEY=94c86bca92dfdb0f7bc932...
DATABASE_URL=postgres://rails:password@my-rds-endpoint.aws.com:5432/production_db
KAMAL_REGISTRY_PASSWORD=dckr_pat_your_docker_registry_token...

When Kamal runs, it maps secret: - RAILS_MASTER_KEY from deploy.yml to the value in .kamal/secrets Deploying Web Applications with Kamal and UpCloud.

Alternatively, you can skip the secrets file and pull them straight from your current system environment shell Tutorial: Deploy a Rails App Using Kamal:
yaml
secret:
- RAILS_MASTER_KEY=<%= ENV["RAILS_MASTER_KEY"] %>


How this pairs with Terraform

The ultimate synergy of this system is that your Terraform outputs become Kamal's inputs.

  1. You run Terraform to spin up AWS RDS and EC2 instances.
  2. Terraform outputs the new Database Host string and VM IPs Terraform Tutorial for DevOps | Guide by Hostman.
  3. You script your deployment workflow to grab those outputs, write them to your clear environment block (or system environment variables), and then call kamal deploy Tutorial: Deploy a Rails App Using Kamal.
You

Would kamal need to know anything about the other resources or services like the database and cacheing layer? How are deployment failures handled?

Assistant

1. How Kamal Interacts with Databases and Caching Layers

Whether Kamal needs to "know" about your database (like PostgreSQL) or caching layers (like Redis) depends on where those services are hosted.

Scenario A: Managed Cloud Infrastructure (Terraform + AWS RDS/ElastiCache)

If you are following production best practices and using Terraform to provision managed stateful layers (e.g., RDS PostgreSQL and AWS ElastiCache Redis):
* Kamal’s Job: Kamal does not manage or orchestrate these services. It only needs to know how to connect to them.
* How it works: You simply output the RDS endpoint and Redis URL from Terraform and feed them into Kamal's environment block as secret environment variables (DATABASE_URL and REDIS_URL) Tutorial: Deploy a Rails App Using Kamal.
* When Kamal boots your Puma containers, they read those environment variables and connect directly to the cloud resources over the private network.

Scenario B: Server-Hosted "Accessories" (Kamal-Managed)

If you are trying to keep costs low and want to run your database and Redis directly on your virtual machines instead of paying for managed cloud services, Kamal features a built-in concept called Accessories Deploying Web Applications with Kamal and UpCloud.
* How it works: You declare your database and caching services right inside your config/deploy.yml file:
yaml
# config/deploy.yml
accessories:
db:
image: postgres:16
host: 192.168.1.10
port: 5432
env:
clear:
POSTGRES_USER: rails
secret:
- POSTGRES_PASSWORD
files:
- db/production_init.sql:/docker-entrypoint-initdb.d/setup.sql
directories:
- /var/lib/postgresql/data:/var/lib/postgresql/data

* Kamal will connect to the designated host, pull the official Postgres/Redis Docker images, provision host directories for persistent storage (so data isn't lost when containers restart), and boot them alongside your app.


2. How Deployment Failures are Handled

One of Kamal’s strongest features is its robust safety net during deployments. It is designed to ensure that a broken build, a missing database migration, or a boot-time crash never takes down your website.

A. The Health Check (First Line of Defense)

Before Kamal routes any production traffic to a newly deployed container, it runs a rigorous health check:
1. Kamal boots your new Puma container on a temporary port.
2. It waits and curls a specific endpoint inside the container (by default, Rails 8’s /up endpoint, which should return a 200 OK if the app and database are running smoothly) Tutorial: Deploy a Rails App Using Kamal.
3. If the health check fails: Kamal stops the deployment immediately. It leaves the old, running containers active and serving traffic via Kamal Proxy.
4. It halts with an error, preserving the crashed container's logs so you can run kamal app logs to see exactly why Puma failed to start (e.g., missing credentials, syntax error, or failed database migration). Your users experience zero downtime.

B. The Instant Rollback (Second Line of Defense)

If a bug passes your health check but breaks production logic (for example, a checkout button crashes the page), you need to revert to a working version immediately.

Because Kamal is Docker-based, rolling back is incredibly fast and reliable:
1. Identify previous builds: You can list recent containers on your server by running kamal app containers Deploying Web Applications with Kamal and UpCloud.
2. Execute the rollback: Run the rollback command Tutorial: Deploy a Rails App Using Kamal:
bash
kamal rollback <git-commit-hash-or-tag>

(Or simply run kamal rollback to automatically fall back to the immediately preceding stable image).
3. Why it’s instantaneous: Unlike older deployment tools (like Capistrano) which had to re-download assets, re-compile code, and rebuild paths, Kamal simply stops the buggy container and boots the previous Docker image—which is already cached on your target server Tutorial: Deploy a Rails App Using Kamal. The rollback completes in less than 5 seconds.

You

How many deployments can you roll back to in Kamal?

Assistant

By default, Kamal allows you to instantly roll back to any of the last 5 deployments Tutorial: Deploy a Rails App Using Kamal.

However, the actual limit is highly configurable and depends on how you manage your server's disk space and where your Docker images are stored.

Here is how the rollback math works and how you can control it:


1. The Local Limit: The prune Setting

To prevent your servers from running out of disk space due to dozens of accumulated, heavy Docker images, Kamal automatically "prunes" (deletes) older images after each successful deployment Tutorial: Deploy a Rails App Using Kamal.

This behavior is controlled by the prune setting inside your config/deploy.yml file:

# config/deploy.yml
prune:
  keep: 5  # This is the default setting
  • How it behaves: Kamal keeps the last 5 active/inactive container images cached locally on your target virtual machines Tutorial: Deploy a Rails App Using Kamal.
  • The Benefit: Rolling back to any of these 5 versions is virtually instantaneous (under 5 seconds) because the server doesn't need to download anything; it simply stops the active container and starts the cached older version Tutorial: Deploy a Rails App Using Kamal.
  • Customization: If you have large SSDs on your VMs, you can safely increase this to 10 or 15 to give yourself a wider safety margin. If your VMs are constrained (e.g., small 10GB disks), you might decrease this to 3.

2. The Remote Limit: Registry Rollbacks (Virtually Infinite)

What happens if you want to roll back to a version from three weeks ago that was already pruned from your server's local disk?

You can still do it! As long as the historical image still exists in your remote container registry (like AWS ECR, GitHub Packages, or Docker Hub), you can target it specifically:

kamal rollback <any-git-tag-or-commit-hash>
  • How it behaves: If Kamal checks your servers and realizes the specified image is no longer locally cached, it doesn't fail. Instead, it securely connects to your remote registry, pulls the requested historical image down to the servers, and performs the rolling restart Deploying Web Applications with Kamal and UpCloud.
  • The Trade-off: This is not instantaneous. It will take a minute or two depending on your network speed, as the server must download the image layers first.

3. The Real Bottleneck: Database Migrations

While Kamal makes it easy to roll back your application code, it cannot automatically roll back your database schema.

If your failed deployment included a destructive database migration (like dropping a column, renaming a table, or changing a data type), simply running kamal rollback to revert the Puma containers will cause your old Rails code to crash because it won't understand the modified database structure.

The Safe Rolling Deployment Rules:

To safely utilize Kamal's rollback capabilities without database-induced downtime, you should follow the "expand and contract" migration pattern:
1. Never make breaking database changes in a single deploy.
2. If you need to rename a column:
* Deploy 1: Add the new column, write code that writes to both columns, and run the migration.
* Deploy 2: Backfill the old data to the new column.
* Deploy 3: Remove the old column.
3. If you must roll back Deploy 3, your database schema is still compatible with Deploy 2. If you must roll back a migration immediately, manually run bin/rails db:migrate:down VERSION=xxxx before running kamal rollback.

You

Is there a canary feature where you can push the load balancer to serve to only 5% of users before pushing it out to everyone?

Assistant

No, Kamal does not have a native, built-in "canary deployment" command (like kamal deploy --canary=5%).

Because Kamal is designed to be a lightweight, zero-dependency deployment tool, its CLI focuses strictly on rolling updates Kamal 2 - A simple, lightweight Docker deployment tool. - Medium. By default, Kamal boots your new container, runs health checks on a separate port, and then instantly swings 100% of local traffic over via Kamal Proxy once it's healthy Tutorial: Deploy a Rails App Using Kamal.

However, because you are using a hybrid architecture—pairing Terraform with Kamal—you have two elegant ways to achieve canary releases or gradual rollouts.


Option 1: The Infrastructure Layer (AWS ALB + Terraform)

Since your static infrastructure is managed via Terraform, you can leverage the AWS Application Load Balancer (ALB) to handle the 5% traffic split.

                        [ AWS Application Load Balancer ]
                                 /            \
                    (95% Traffic)              (5% Traffic)
                               /                \
               [ Stable Target Group ]     [ Canary Target Group ]
               (VMs running main app)      (VMs running canary app)

How you set this up:

  1. Terraform Configuration: You configure your AWS ALB with two different Target Groups: app-stable and app-canary. In your ALB listener rules, you set a weighted routing rule directing 95% of traffic to app-stable and 5% to app-canary.
  2. Kamal Deployment: You create a secondary Kamal configuration file (e.g., config/deploy.canary.yml) that points to a specific subset of "canary" virtual machines.
  3. The Workflow:
    • To deploy a canary, you run: bash kamal deploy -c config/deploy.canary.yml
    • Your 5% canary servers are updated. The ALB immediately begins sending 5% of real-world traffic to them.
    • If errors spike in your monitoring tool (e.g., Sentry, Honeybadger), you roll back only the canary.
    • If things look good after an hour, you run your standard kamal deploy to update the remaining 95% of your main servers.

Option 2: The Application Layer (Rails Feature Flags - Recommended)

In the Ruby on Rails ecosystem, executing network-level canary deploys (Option 1) can sometimes introduce complexities with session state, cookie stickiness, and WebSocket connections (ActionCable).

Because of this, most Rails teams prefer to handle "canary" features at the application layer using Feature Flags (using gems like Flipper).

How this works:

  1. You deploy 100% of your code to 100% of your servers using a standard kamal deploy.
  2. Inside your Rails code, you wrap new features in a block: ruby if Flipper.enabled?(:new_checkout_funnel, current_user) render :new_checkout else render :old_checkout end
  3. In your Flipper dashboard, you configure the feature to be enabled for only 5% of users (or specifically for internal employees first).
  4. If the new feature fails, you toggle it off instantly in the Flipper dashboard with zero redeployments or server changes.

Summary: What Kamal Does Provide Out-of-the-Box

While Kamal doesn't do fractional traffic splitting, it does protect your users during deployments via its native Rolling Deployments:
* If you have 4 web servers, Kamal does not update them all at once.
* It deploys to Server 1, waits for the health check to pass Tutorial: Deploy a Rails App Using Kamal, and then moves to Server 2.
* If Server 2 fails its health check, the deployment halts immediately. Servers 3 and 4 are untouched, and Server 1 is still serving traffic, meaning a bad build only ever impacts a fraction of your active infrastructure before auto-stopping.

You

Can you tell me how deployment can change with something like devise?

Assistant

When you introduce an authentication library like Devise into your Ruby on Rails application, your deployment strategy changes from "just putting code on a server" to protecting user sessions, preventing bulk logouts, and ensuring seamless database updates.

Because Kamal uses containerized rolling deployments (where old and new versions of your app run simultaneously for a brief period) Tutorial: Deploy a Rails App Using Kamal, integrating Devise introduces five critical requirements to prevent breaking the user experience during a deploy.


1. The SECRET_KEY_BASE Trap (Preventing Bulk Logouts)

Devise relies heavily on Rails' SECRET_KEY_BASE to encrypt and sign your users' session cookies Tutorial: Deploy a Rails App Using Kamal.

  • The Danger: If your SECRET_KEY_BASE changes during a deployment (or if it is dynamically generated inside the container on boot), every single logged-in user will be immediately logged out on your next deployment because their session cookies can no longer be decrypted.
  • The Kamal Solution: You must explicitly define SECRET_KEY_BASE as an encrypted environment variable. Keep it persistent across deployments by defining it in your secrets file and referencing it in config/deploy.yml Tutorial: Deploy a Rails App Using Kamal:
# config/deploy.yml
env:
  secret:
    - SECRET_KEY_BASE

(With the actual secret stored securely in .kamal/secrets so it never changes unless you manually rotate it).


2. Multi-Server Session Routing (Sticky Sessions vs. Redis)

Because Kamal runs a rolling deployment, Server 1 might be running your old container while Server 2 is running your new container Tutorial: Deploy a Rails App Using Kamal.

  • The Problem: If a user logs in on Server 1, and their next click (like going to /dashboard) gets routed to Server 2 by the AWS Load Balancer, what happens?
  • If using Cookie Store (Default): As long as both servers use the exact same SECRET_KEY_BASE, the user will stay logged in. The session cookie is sent with every request, and Server 2 can decrypt it just fine.
  • If using Database/Redis Store: If you store sessions in Redis or your DB, Server 2 must have immediate, low-latency access to the centralized caching layer (e.g., AWS ElastiCache provisioned by Terraform) so it can lookup the user's session token.

3. Database Migrations on the users Table

If you decide to enable a new Devise feature—like adding Trackable (tracking IP addresses and sign-in times) or Lockable (locking accounts after failed attempts)—you will need to generate a migration to add new columns to your users table.

  • The Danger: If you deploy this change, and the migration runs immediately, your old Puma containers (which are still serving traffic on other VMs) will crash when a user tries to sign in, because the old Rails code does not expect or know how to handle the new columns.
  • The Solution:
    1. Always make new database columns nullable (null: true) or give them default values.
    2. Run kamal deploy to spin up the new code.
    3. Once the rolling deploy completes and the old containers are stopped, you can safely write a subsequent migration to add constraints (like null: false) if necessary.

4. Background Workers for Transactional Mailers

Devise sends a lot of critical emails (e.g., Password Reset, Account Confirmation, Two-Factor Authentication codes). By default, Rails sends these emails synchronously, which slows down the user's request.

To keep your app fast, you should configure Devise to send emails asynchronously via ActiveJob (using a background worker like Sidekiq). This drastically changes your Kamal setup:

  • You must configure Kamal to manage both web and worker roles in your config/deploy.yml Deploying Web Applications with Kamal and UpCloud.
  • Your deploy.yml must also securely pass your SMTP or third-party mailing API keys (like Postmark or SendGrid):
# config/deploy.yml
servers:
  web:
    - 192.168.1.10
  worker:
    hosts:
      - 192.168.1.11
    cmd: bundle exec sidekiq # Boots Sidekiq instead of Puma on this machine

env:
  secret:
    - POSTMARK_API_TOKEN # Injected into the containers to allow Devise to send emails

5. Multi-Server Rate Limiting (Preventing Brute Force Attacks)

If your application scales to multiple VMs using Kamal, you become vulnerable to distributed brute force attacks on your Devise /users/sign_in endpoint.

  • If you use a simple in-memory rate limiter like Rack::Attack on each VM, an attacker can bypass the limit by sending login attempts across your different servers.
  • The Solution: Your Terraform configuration should provision a centralized AWS ElastiCache Redis instance. Your Rails app's rate-limiting middleware must be configured to point to this single Redis instance, ensuring login limits are enforced globally across all Kamal-managed VMs.
You

What do you think about using open shift?

Assistant

Choosing Red Hat OpenShift represents an absolute paradigm shift from the architecture we have designed so far. It brings into sharp focus a fundamental architectural tension:

The Tension: Operational Simplicity & Developer Velocity (Kamal) vs. Enterprise Standardization & Continuous Compliance (OpenShift).

  • On one hand, the Kamal + Terraform approach maximizes developer speed and keeps hosting costs low by avoiding the "Kubernetes tax," bypassing heavy container orchestrators in favor of direct, agentless Docker management over SSH Tutorial: Deploy a Rails App Using Kamal.
  • On the other hand, OpenShift (Red Hat's enterprise-packaged Kubernetes platform) provides a highly opinionated, secure, and automated ecosystem. It eliminates the manual integration of load balancers, registries, and ingress controllers, but demands massive operational overhead, high licensing fees, and a steep learning curve.

Key Synthesis Points & Cross-Source Connections

  • The Drift Reconciliation Boundary (Terraform State vs. OpenShift Operators): In our previous architecture, we relied on Terraform to prevent "configuration drift" by comparing live AWS resources against a static, encrypted state file Terraform Tutorial for DevOps | Guide by Hostman.
    • Under OpenShift: The responsibility for drift prevention shifts. OpenShift utilizes native Kubernetes Operators and its internal etcd database to continuously reconcile the cluster state in real-time. If a container crashes, an ingress route fails, or a secret changes, OpenShift heals itself continuously without needing an external terraform apply loop Terraform Tutorial for DevOps | Guide by Hostman. However, provisioning the underlying physical infrastructure and OpenShift clusters themselves still requires tools like Terraform or Ansible to bootstrap the virtual machines What is Ansible? A Tool to Automate Parts of Your Job.
  • The "Kubernetes Tax" Extrapolated: We previously explored how Kamal keeps resource usage low by running Puma directly on Docker containers bound to host VM ports Tutorial: Deploy a Rails App Using Kamal.
    • OpenShift is the ultimate expression of the "Kubernetes tax." The control plane alone (masters, routers, metrics, logging, and security context constraints) requires significant computing power—often demanding at least 3-5 heavy virtual machines just to run the idle system. For a startup or medium-sized Rails application, this is highly inefficient. However, for a massive enterprise, this "tax" pays for automated security patching, multi-tenancy, and integrated monitoring out of the box.
  • CI/CD Pipeline Friction (Source-to-Image vs. SSH Deployment): With Kamal, your CI/CD pipeline (like GitHub Actions) has to build the Docker image, push it to a registry, and SSH into your servers to trigger a rolling update Tutorial: Deploy a Rails App Using Kamal.
    • OpenShift radically simplifies this via Source-to-Image (S2I). You can point OpenShift directly at your Ruby on Rails Git repository. The platform detects the Rails app, automatically builds the container image using internal secure builders, and rolls out the deployment within the cluster boundaries. It completely bypasses the need for external registries or managing SSH keys on your runners Deploying Web Applications with Kamal and UpCloud.

Summary Perspective

Using OpenShift is usually overkill for standard Ruby on Rails deployments unless you are operating in a highly regulated enterprise environment (such as banking, healthcare, or government) that mandates strict Security Context Constraints (SCCs), private-cloud isolation, and formal Red Hat support. For fast-moving product teams, pairing Terraform with Kamal provides 90% of the operational utility of a container workflow at a fraction of the cost and complexity Tutorial: Deploy a Rails App Using Kamal.


Follow-up Question:
What are the primary compliance or scaling requirements of your project (e.g., SOC2 compliance, multi-region failover, or strict network isolation) that prompted you to look at an enterprise Kubernetes solution like OpenShift?

Sign in to participate in this conversation.