Slower but more precise context retrieval.
What is a minimal terraform setup for rails?
A minimal, production-grade Terraform setup for a Ruby on Rails application is not just a single script, but a modularized infrastructure. It must securely manage three foundational layers:
1. Networking (VPC): To isolate application components.
2. Stateful Database (RDS PostgreSQL): To persist your Rails ActiveRecord data.
3. Stateless Compute (ECS or EKS/Kubernetes): To run Puma (web) and Sidekiq (background workers) containers.
By synthesizing best practices from a DevOps perspective with real-world Rails infrastructure patterns from thoughtbot (a premier Rails agency), we can define the architecture, directory layout, and key code boundaries for a minimal setup.
According to the Terraform Tutorial for DevOps | Guide by Hostman, keeping a clear and organized folder structure is crucial for maintainability. Instead of writing a single monolithic main.tf file, you should divide your code into root modules (for environment-specific deployments) and resource modules (for reusable infrastructure components).
A standard, production-ready directory structure looks like this:
├── environments/
│ ├── production/
│ │ ├── main.tf # Root configuration calling VPC, RDS, and Compute
│ │ ├── provider.tf # Cloud provider definition
│ │ ├── variables.tf # Environment-specific values (e.g., instance sizes)
│ │ └── outputs.tf
│ └── staging/
└── modules/
├── network/ # Shared resources (VPC, subnets, internet gateways)
├── database/ # Stateful RDS Postgres instance
└── compute/ # Stateless container runner (ECS Fargate or EKS)
Each module folder should consistently contain a main.tf (core resource definitions), variables.tf (inputs and defaults), outputs.tf (to pass values to other modules), and a README.md Terraform Tutorial for DevOps | Guide by Hostman.
modules/network)Your network isolates your Rails compute from the public web.
* Public Subnets: House the Application Load Balancer (ALB) to accept public HTTPS traffic.
* Private Subnets: House the actual Rails servers and the RDS database. Traffic can only reach them via the ALB or NAT Gateway.
* Best Practice: Shared resources like networks should be kept in a dedicated common directory to avoid rebuilds during app deployments Terraform Tutorial for DevOps | Guide by Hostman.
modules/database)Rails defaults to PostgreSQL. Databases are stateful resources and require extra protection.
* Deletion Protection: Ensure deletion_protection = true is enabled in your aws_db_instance block to prevent accidental loss of data Terraform Tutorial for DevOps | Guide by Hostman.
* Encryption: Ensure that your storage and your Terraform state files are encrypted as an extra security measure Terraform Tutorial for DevOps | Guide by Hostman.
* Outputs: This module must output the database address (endpoint) and the port to be consumed by the Rails container.
modules/compute)Your Rails container deployment (often managed via Docker and ECS or Kubernetes EKS) runs here.
* The EKS / Flightdeck Approach: In their open-source platform engineering tool Flightdeck, thoughtbot packages years of client deployment experience into specialized Terraform modules Learning Terraform - A hands-on, open source contribution.
* If utilizing EKS (Kubernetes) for your Rails compute, a minimal cluster setup requires:
* A unique cluster name (e.g., example-production-v1) Learning Terraform - A hands-on, open source contribution.
* At least one node group definition Learning Terraform - A hands-on, open source contribution.
* Thoughtbot’s default node group configuration establishes a baseline of t3.large instances with a minimum node size of 2 to ensure high availability for web requests Learning Terraform - A hands-on, open source contribution.
When implementing this minimal setup, you must navigate several architectural tradeoffs:
t2.micro EC2 instance. However, production-grade module frameworks like thoughtbot's Flightdeck assume a more robust baseline (e.g., dual-node t3.large EKS clusters) to support high-availability container scaling and background job queues Learning Terraform - A hands-on, open source contribution.What is the advantage of using Kamal with terraform?
Using Kamal (the open-source, Docker-first deployment tool developed by 37signals) in tandem with Terraform creates an incredibly powerful, cost-effective, and elegant DevOps pipeline for Ruby on Rails applications.
Rather than viewing them as competing tools, think of them as having a strict, complementary division of labor: Terraform provisions the physical "hardware" (infrastructure), and Kamal deploys the "software" (containers).
Here are the primary advantages of combining Kamal with Terraform:
One of the most common anti-patterns in DevOps is forcing a tool to do something it wasn't designed for.
* The Problem with Terraform-only deploys: While you can use Terraform to spin up ECS services or Kubernetes pods, using it to push daily application updates, manage environment variables, or run Rails database migrations is slow and clunky. Terraform is designed to manage static, declarative infrastructure state.
* The Solution: You use Terraform to provision the foundational network, the virtual machines (e.g., AWS EC2, Hetzner, or DigitalOcean), and the database (RDS) Terraform Tutorial for DevOps | Guide by Hostman. You then hand off the dynamic, rapid-fire application deployment process to Kamal. Kamal specializes in building Docker images, pushing them to a registry, running database migrations, and executing zero-downtime rolling restarts.
In traditional enterprise setups (like thoughtbot’s Flightdeck), running Rails in production often requires AWS EKS (Kubernetes) or ECS Fargate Learning Terraform - A hands-on, open source contribution. These systems require complex cluster networking, node group scaling, and proprietary target groups, which drives up AWS bills and cognitive overhead.
* By pairing Terraform + Kamal, you can use Terraform to provision simple, cheap virtual machines (like basic EC2 instances or Hetzner VPSs) and an RDS database.
* Kamal then transforms those basic virtual machines into a highly resilient web cluster using standard Docker containers and its built-in proxy. You get the scaling benefits of containers without paying the "Kubernetes tax."
Like Ansible, which is celebrated for being an "agentless tool" that automates tasks directly over SSH to reduce human error What is Ansible? A Tool to Automate Parts of Your Job, Kamal is completely agentless.
* How they integrate: When Terraform provisions your virtual machines, it outputs their public or private IP addresses Terraform Tutorial for DevOps | Guide by Hostman.
* You feed these IPs directly into Kamal’s configuration file (deploy.yml).
* Because Kamal is agentless, you do not need to use Terraform to install heavy agents, background daemons, or cluster management software on the target servers. Kamal simply SSHs into the bare VMs provisioned by Terraform, installs Docker (if missing), and boots your Rails application.
If you deploy Rails using Terraform modules highly customized for AWS (such as ECS, ALB, and CloudWatch), migrating to another cloud provider is a massive engineering undertaking.
* With Kamal + Terraform: Your infrastructure code is modularized Terraform Tutorial for DevOps | Guide by Hostman. If you want to move from AWS to Hetzner or DigitalOcean, you only need to rewrite a minimal Terraform module to spin up basic VMs on the new provider.
* Because Kamal only cares about having bare Linux servers with SSH access, your application configuration (deploy.yml) remains virtually unchanged. You achieve total cloud portability.
| Step | Tool | Responsibility |
|---|---|---|
| 1. Provision | Terraform | Creates VPC, Security Groups, RDS Postgres, and bare EC2 VMs. Outputs the VM IP addresses. |
| 2. Configure | Kamal | Reads the IPs from Terraform. SSHs into the servers, installs Docker, and boots the Kamal Proxy. |
| 3. Deploy | Kamal | Builds your Rails Docker image, pushes it to your registry, runs migrations, and boots the new containers with zero-downtime. |
In terms of networking or the puma servers, is there anything I should keep in mind?
When combining Terraform and Kamal to deploy a Ruby on Rails application, the handshake between your cloud network (managed by Terraform) and your web server configuration (Puma + Kamal Proxy) is critical.
Mistakes here usually manifest as deployment timeouts, database connection limits being exceeded, or Puma containers crashing due to Out Of Memory (OOM) errors.
Here is what you must keep in mind regarding Networking and Puma when designing your setup.
Your Terraform configuration must establish the correct routing and open the exact ports that Kamal and your containerized Puma instances require.
You need to decide where SSL/TLS encryption will be decrypted. This determines your Terraform network design:
Internet (HTTPS) -> ALB -> EC2 Instances (HTTP).80 and 443 to the world on the ALB. On the VMs, open port 80 only to traffic coming from the ALB's security group.Internet (HTTPS) -> VM (Kamal Proxy) -> Puma Container.80 and 443 on the VM directly to the public internet so Kamal Proxy can receive web traffic and perform Let's Encrypt ACME challenges.Because Kamal is agentless and deploys strictly over SSH What is Ansible? A Tool to Automate Parts of Your Job, your Terraform security groups must allow incoming traffic on Port 22 to your VMs.
* Best Practice: Do not leave port 22 wide open to the entire internet (0.0.0.0/0). Restrict port 22 access to your company's VPN IP, your office IP, or your CI/CD runner's IP (like GitHub Actions).
Following the best practice of separating stateful and stateless resources Terraform Tutorial for DevOps | Guide by Hostman, your RDS PostgreSQL database should live in a private subnet.
* Security Group: The RDS instance's security group should only allow incoming traffic on port 5432 from the security group of your Puma VM instances. It should never be accessible from the public internet.
Inside your Docker container, Puma acts as the application gateway. How you configure Puma must align perfectly with the virtual machine sizes you provisioned via Terraform.
By default, Kamal Proxy routes traffic to port 3000 inside your Rails Docker container.
* The Trap: Ensure your Dockerfile or your Puma configuration (config/puma.rb) binds to 0.0.0.0 (all network interfaces) and not 127.0.0.1 (localhost). If Puma binds to 127.0.0.1, Kamal Proxy will not be able to route traffic to it from the host machine.
* Puma Config:
ruby
# config/puma.rb
port Env.fetch("PORT") { 3000 }
bind "tcp://0.0.0.0:#{ENV.fetch("PORT") { 3000 }}"
If you provision small, cheap VMs via Terraform, running Puma in its default "clustered" mode (multiple worker processes) will quickly crash your server due to Out Of Memory (OOM) errors.
t3.micro / t3.small - 1GB to 2GB RAM):
WEB_CONCURRENCY=0RAILS_MAX_THREADS=5t3.large - 2 vCPUs, 8GB RAM):
WEB_CONCURRENCY=2 (1 worker per vCPU core)RAILS_MAX_THREADS=5Every Puma worker utilizes threads to handle concurrent requests. You must ensure your Rails database connection pool size is configured correctly to prevent "ActiveRecord::ConnectionTimeoutError" crashes.
Use this mathematical formula to configure your Rails database.yml and your RDS database limit in Terraform:
$$\text{Required DB Pool Size} = (\text{Puma Workers} \times \text{Puma Threads}) + \text{Sidekiq Concurrency}$$
10.aws_db_instance configuration, ensure the PostgreSQL max_connections parameter is larger than:
$$\text{Required DB Pool Size} \times \text{Number of VM Servers} + \text{buffer for migrations/consoles}$$
If you scale your VMs from 2 to 4 using Terraform, your database must be configured to handle the exponential rise in open connections.How would the CI/CD work in this context? Am I running kamal locally? Can I attach it to a github action or bitbucket pipeline safely?
Yes, you can absolutely run Kamal locally, and in fact, its "zero-dependency" local execution is one of its biggest selling points. However, for a team or a professional production environment, attaching Kamal to a CI/CD pipeline like GitHub Actions or Bitbucket Pipelines is the best practice.
Because Kamal is an agentless tool that operates entirely over SSH—highly similar to how Ansible manages server automation What is Ansible? A Tool to Automate Parts of Your Job—integrating it into a CI/CD pipeline is straightforward, but it requires specific security precautions.
Here is how the workflow works, how to configure a GitHub Actions pipeline, and how to do it safely.
When you merge code to your main branch, the CI/CD runner (GitHub/Bitbucket) takes over the deployment. The runner acts exactly like your local computer would:
[ Git Push to Main ]
│
▼
[ CI/CD Runner (GitHub/Bitbucket) ]
│ ── (Builds Docker Image & Pushes) ──> [ Container Registry (e.g., ECR/GHCR) ]
│
│ ── (SSH Commands to Pull & Deploy) ──> [ Puma VMs (Provisioned by Terraform) ]
To run Kamal in GitHub Actions, you must provide the runner with Docker access, Ruby (to run the kamal gem), and your SSH private key.
Here is a minimal, working .github/workflows/deploy.yml pipeline:
name: Deploy to Production
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
# 1. Set up Ruby (needed to run Kamal commands)
- name: Set up Ruby
uses: ruby/setup-ruby@v1
with:
ruby-version: '3.3'
bundler-cache: true
# 2. Install Kamal
- name: Install Kamal
run: gem install kamal -v 2.3.0 # Match your local Kamal version
# 3. Set up Docker Buildx (for faster builds)
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
# 4. Set up SSH Agent and load the Private Key
# This allows the runner to log into the VMs provisioned by Terraform
- name: Set up SSH Agent
uses: webfactory/ssh-agent@v0.9.0
with:
ssh-private-key: ${{ secrets.SSH_PRIVATE_KEY }}
# 5. Log into Docker Registry
- name: Log in to Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# 6. Deploy with Kamal
# Environment variables are passed securely from GitHub Secrets
- name: Deploy Application
env:
RAILS_MASTER_KEY: ${{ secrets.RAILS_MASTER_KEY }}
KAMAL_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
run: kamal deploy
Running agentless deployment tools in a shared CI/CD environment introduces specific security vectors you must lock down.
Your CI/CD pipeline needs an SSH private key to log into your Puma servers.
* The Rule: Never hardcode keys or commit them to git. Store the private key in GitHub/Bitbucket Repository Secrets Terraform Tutorial for DevOps | Guide by Hostman.
* The Terraform Hook: When Terraform provisions your EC2/VM instances, it should inject the public half of this key pair into the server's ~/.ssh/authorized_keys file (often using the aws_key_pair resource in Terraform).
How does your CI/CD runner know what the server IPs are?
* Rather than hardcoding IPs in your config/deploy.yml, you can make Kamal dynamic by reading environment variables:
yaml
# config/deploy.yml
service: my-rails-app
servers:
web:
- <%= ENV['PRIMARY_WEB_IP'] %>
- <%= ENV['SECONDARY_WEB_IP'] %>
* In your CI/CD pipeline, you can use the Terraform CLI to fetch the latest IPs directly from your state file before running Kamal:
bash
export PRIMARY_WEB_IP=$(terraform output -raw primary_web_ip)
kamal deploy
Note: Ensure your Terraform state is stored securely in a remote backend with encryption enabled (like AWS S3) to protect these outputs Terraform Tutorial for DevOps | Guide by Hostman.
If you have followed best-practice network design and placed your servers inside a private subnet Terraform Tutorial for DevOps | Guide by Hostman, public CI/CD runners (like GitHub's default runners) won't be able to SSH into them. You have two ways to solve this safely:
Is terraform something typically used to control aws configurations? What happens when something changes manually in the website? How does terraform handle the drift?
Yes, absolutely. Terraform is the industry-standard tool for managing AWS configurations.
While manual configuration in the AWS Console is acceptable for "one-off environments or testing" AWS Timestream with Terraform, it is highly discouraged for production systems. For professional teams, writing declarative Terraform code is the default method for provisioning VPCs, database instances, and compute clusters because it makes infrastructure reproducible, auditable, and easily scalable.
If a team member logs into the AWS Console and manually alters a setting—such as changing an EC2 instance size, deleting a security group rule, or modifying an RDS database parameter—this creates a situation known as configuration drift.
When drift occurs, your real-world infrastructure in AWS no longer matches the "desired state" written in your Terraform .tf files.
Terraform's architecture is uniquely suited to detect and fix drift. It relies on a critical component called the State File (terraform.tfstate), which acts as a database mapping your code to real-world AWS resources Terraform Tutorial for DevOps | Guide by Hostman.
Here is the step-by-step process of how Terraform handles drift:
terraform planWhen you run terraform plan, Terraform does not just look at your local files; it performs a refresh cycle.
1. It queries the live AWS APIs to check the current, actual status of all resources listed in your state file.
2. It compares the Actual State (what is live on AWS right now) with the Desired State (what is written in your .tf code).
3. If it detects a manual change (drift), it prints out a list of discrepancies. For example:
* “AWS Security Group has an extra rule not declared in code.”
* “Database instance size is t3.large on AWS, but code says t3.medium.”
terraform applyIf drift is detected, you have two paths to resolve it, depending on whether the manual change was a mistake or intentional:
terraform apply. Terraform will communicate with the AWS API to overwrite the manual console change and revert the resource back to the exact specifications written in your .tf files. (e.g., it will shrink the database back to t3.medium or delete the unauthorized security group rule).terraform plan will show No changes. Your infrastructure matches the configuration.Because the state file is the source of truth for detecting drift, managing it safely is crucial:
* State Encryption: Because your state file contains sensitive metadata about your AWS configuration (and sometimes database passwords), it should always be encrypted in transit and at rest Terraform Tutorial for DevOps | Guide by Hostman.
* State Locking: If two developers (or a developer and a CI/CD pipeline) run terraform apply at the same time, they could corrupt the state file or apply conflicting changes. To prevent this, your backend configuration (such as AWS S3 with DynamoDB) should have state locking enabled to ensure only one entity can modify the infrastructure at any given moment Terraform Tutorial for DevOps | Guide by Hostman.
Does kamal give you the ability to store configurations?
Yes, Kamal has a highly structured, built-in configuration system.
Instead of relying on server-side configuration panels or manual environment management, Kamal uses a declarative, code-first configuration approach. When you run kamal init, it creates two key local files in your Rails repository that govern your entire deployment Tutorial: Deploy a Rails App Using Kamal:
config/deploy.yml (The blueprint: committed to Git)..kamal/secrets (The safe: never committed to Git) Deploying Web Applications with Kamal and UpCloud.Here is how Kamal stores and resolves your configurations safely:
config/deploy.ymlThis file stores the structure of your application deployment Tutorial: Deploy a Rails App Using Kamal. It specifies:
* Service Name: The name used to isolate your Docker containers on the target machine Tutorial: Deploy a Rails App Using Kamal.
* Docker Registry: Where to push your built production images (e.g., Docker Hub, GitHub Packages) Tutorial: Deploy a Rails App Using Kamal.
* Server IPs (Roles): The target destination IPs (e.g., separating web, worker, and cron servers) Deploying Web Applications with Kamal and UpCloud.
* Volumes & Assets: Shared directories or folders you want mapped from the host to the container.
A major advantage of config/deploy.yml is that it is parsed with Ruby ERB (Embedded Ruby) Tutorial: Deploy a Rails App Using Kamal. This means your configuration doesn't have to be strictly hardcoded; you can pull variables dynamically:
# config/deploy.yml
service: my-rails-app
image: username/my-rails-app
# Dynamic server list loaded from local system environment variables
servers:
web:
- <%= ENV["PRIMARY_WEB_IP"] %>
- <%= ENV["SECONDARY_WEB_IP"] %>
Kamal makes a strict architectural distinction between non-sensitive app configurations (like RAILS_ENV) and sensitive secrets (like DATABASE_URL or SECRET_KEY_BASE). It stores them under separate keys in deploy.yml:
# config/deploy.yml
env:
# 1. Clear text variables (Safe to commit to Git)
clear:
RAILS_ENV: production
PORT: "3000"
DB_HOST: "my-rds-endpoint.aws.com" # Outputted by Terraform!
# 2. Secret variables (Only keys are defined here; values are pulled securely)
secret:
- RAILS_MASTER_KEY
- DATABASE_URL
.kamal/secrets (or Local Environment)To keep secrets out of Git, you store the actual sensitive values in a local secrets file. When you run kamal deploy, Kamal reads these variables locally, injects them securely over SSH, and boots the container on your servers Deploying Web Applications with Kamal and UpCloud.
In Kamal 2, this is managed via .kamal/secrets:
# .kamal/secrets (Keep this file in your .gitignore!)
RAILS_MASTER_KEY=94c86bca92dfdb0f7bc932...
DATABASE_URL=postgres://rails:password@my-rds-endpoint.aws.com:5432/production_db
KAMAL_REGISTRY_PASSWORD=dckr_pat_your_docker_registry_token...
When Kamal runs, it maps secret: - RAILS_MASTER_KEY from deploy.yml to the value in .kamal/secrets Deploying Web Applications with Kamal and UpCloud.
Alternatively, you can skip the secrets file and pull them straight from your current system environment shell Tutorial: Deploy a Rails App Using Kamal:
yaml
secret:
- RAILS_MASTER_KEY=<%= ENV["RAILS_MASTER_KEY"] %>
The ultimate synergy of this system is that your Terraform outputs become Kamal's inputs.
clear environment block (or system environment variables), and then call kamal deploy Tutorial: Deploy a Rails App Using Kamal.Would kamal need to know anything about the other resources or services like the database and cacheing layer? How are deployment failures handled?
Whether Kamal needs to "know" about your database (like PostgreSQL) or caching layers (like Redis) depends on where those services are hosted.
If you are following production best practices and using Terraform to provision managed stateful layers (e.g., RDS PostgreSQL and AWS ElastiCache Redis):
* Kamal’s Job: Kamal does not manage or orchestrate these services. It only needs to know how to connect to them.
* How it works: You simply output the RDS endpoint and Redis URL from Terraform and feed them into Kamal's environment block as secret environment variables (DATABASE_URL and REDIS_URL) Tutorial: Deploy a Rails App Using Kamal.
* When Kamal boots your Puma containers, they read those environment variables and connect directly to the cloud resources over the private network.
If you are trying to keep costs low and want to run your database and Redis directly on your virtual machines instead of paying for managed cloud services, Kamal features a built-in concept called Accessories Deploying Web Applications with Kamal and UpCloud.
* How it works: You declare your database and caching services right inside your config/deploy.yml file:
yaml
# config/deploy.yml
accessories:
db:
image: postgres:16
host: 192.168.1.10
port: 5432
env:
clear:
POSTGRES_USER: rails
secret:
- POSTGRES_PASSWORD
files:
- db/production_init.sql:/docker-entrypoint-initdb.d/setup.sql
directories:
- /var/lib/postgresql/data:/var/lib/postgresql/data
* Kamal will connect to the designated host, pull the official Postgres/Redis Docker images, provision host directories for persistent storage (so data isn't lost when containers restart), and boot them alongside your app.
One of Kamal’s strongest features is its robust safety net during deployments. It is designed to ensure that a broken build, a missing database migration, or a boot-time crash never takes down your website.
Before Kamal routes any production traffic to a newly deployed container, it runs a rigorous health check:
1. Kamal boots your new Puma container on a temporary port.
2. It waits and curls a specific endpoint inside the container (by default, Rails 8’s /up endpoint, which should return a 200 OK if the app and database are running smoothly) Tutorial: Deploy a Rails App Using Kamal.
3. If the health check fails: Kamal stops the deployment immediately. It leaves the old, running containers active and serving traffic via Kamal Proxy.
4. It halts with an error, preserving the crashed container's logs so you can run kamal app logs to see exactly why Puma failed to start (e.g., missing credentials, syntax error, or failed database migration). Your users experience zero downtime.
If a bug passes your health check but breaks production logic (for example, a checkout button crashes the page), you need to revert to a working version immediately.
Because Kamal is Docker-based, rolling back is incredibly fast and reliable:
1. Identify previous builds: You can list recent containers on your server by running kamal app containers Deploying Web Applications with Kamal and UpCloud.
2. Execute the rollback: Run the rollback command Tutorial: Deploy a Rails App Using Kamal:
bash
kamal rollback <git-commit-hash-or-tag>
(Or simply run kamal rollback to automatically fall back to the immediately preceding stable image).
3. Why it’s instantaneous: Unlike older deployment tools (like Capistrano) which had to re-download assets, re-compile code, and rebuild paths, Kamal simply stops the buggy container and boots the previous Docker image—which is already cached on your target server Tutorial: Deploy a Rails App Using Kamal. The rollback completes in less than 5 seconds.
How many deployments can you roll back to in Kamal?
By default, Kamal allows you to instantly roll back to any of the last 5 deployments Tutorial: Deploy a Rails App Using Kamal.
However, the actual limit is highly configurable and depends on how you manage your server's disk space and where your Docker images are stored.
Here is how the rollback math works and how you can control it:
prune SettingTo prevent your servers from running out of disk space due to dozens of accumulated, heavy Docker images, Kamal automatically "prunes" (deletes) older images after each successful deployment Tutorial: Deploy a Rails App Using Kamal.
This behavior is controlled by the prune setting inside your config/deploy.yml file:
# config/deploy.yml
prune:
keep: 5 # This is the default setting
10 or 15 to give yourself a wider safety margin. If your VMs are constrained (e.g., small 10GB disks), you might decrease this to 3.What happens if you want to roll back to a version from three weeks ago that was already pruned from your server's local disk?
You can still do it! As long as the historical image still exists in your remote container registry (like AWS ECR, GitHub Packages, or Docker Hub), you can target it specifically:
kamal rollback <any-git-tag-or-commit-hash>
While Kamal makes it easy to roll back your application code, it cannot automatically roll back your database schema.
If your failed deployment included a destructive database migration (like dropping a column, renaming a table, or changing a data type), simply running kamal rollback to revert the Puma containers will cause your old Rails code to crash because it won't understand the modified database structure.
To safely utilize Kamal's rollback capabilities without database-induced downtime, you should follow the "expand and contract" migration pattern:
1. Never make breaking database changes in a single deploy.
2. If you need to rename a column:
* Deploy 1: Add the new column, write code that writes to both columns, and run the migration.
* Deploy 2: Backfill the old data to the new column.
* Deploy 3: Remove the old column.
3. If you must roll back Deploy 3, your database schema is still compatible with Deploy 2. If you must roll back a migration immediately, manually run bin/rails db:migrate:down VERSION=xxxx before running kamal rollback.
Is there a canary feature where you can push the load balancer to serve to only 5% of users before pushing it out to everyone?
No, Kamal does not have a native, built-in "canary deployment" command (like kamal deploy --canary=5%).
Because Kamal is designed to be a lightweight, zero-dependency deployment tool, its CLI focuses strictly on rolling updates Kamal 2 - A simple, lightweight Docker deployment tool. - Medium. By default, Kamal boots your new container, runs health checks on a separate port, and then instantly swings 100% of local traffic over via Kamal Proxy once it's healthy Tutorial: Deploy a Rails App Using Kamal.
However, because you are using a hybrid architecture—pairing Terraform with Kamal—you have two elegant ways to achieve canary releases or gradual rollouts.
Since your static infrastructure is managed via Terraform, you can leverage the AWS Application Load Balancer (ALB) to handle the 5% traffic split.
[ AWS Application Load Balancer ]
/ \
(95% Traffic) (5% Traffic)
/ \
[ Stable Target Group ] [ Canary Target Group ]
(VMs running main app) (VMs running canary app)
app-stable and app-canary. In your ALB listener rules, you set a weighted routing rule directing 95% of traffic to app-stable and 5% to app-canary.config/deploy.canary.yml) that points to a specific subset of "canary" virtual machines.bash
kamal deploy -c config/deploy.canary.yml
kamal deploy to update the remaining 95% of your main servers.In the Ruby on Rails ecosystem, executing network-level canary deploys (Option 1) can sometimes introduce complexities with session state, cookie stickiness, and WebSocket connections (ActionCable).
Because of this, most Rails teams prefer to handle "canary" features at the application layer using Feature Flags (using gems like Flipper).
kamal deploy.ruby
if Flipper.enabled?(:new_checkout_funnel, current_user)
render :new_checkout
else
render :old_checkout
end
While Kamal doesn't do fractional traffic splitting, it does protect your users during deployments via its native Rolling Deployments:
* If you have 4 web servers, Kamal does not update them all at once.
* It deploys to Server 1, waits for the health check to pass Tutorial: Deploy a Rails App Using Kamal, and then moves to Server 2.
* If Server 2 fails its health check, the deployment halts immediately. Servers 3 and 4 are untouched, and Server 1 is still serving traffic, meaning a bad build only ever impacts a fraction of your active infrastructure before auto-stopping.
Can you tell me how deployment can change with something like devise?
When you introduce an authentication library like Devise into your Ruby on Rails application, your deployment strategy changes from "just putting code on a server" to protecting user sessions, preventing bulk logouts, and ensuring seamless database updates.
Because Kamal uses containerized rolling deployments (where old and new versions of your app run simultaneously for a brief period) Tutorial: Deploy a Rails App Using Kamal, integrating Devise introduces five critical requirements to prevent breaking the user experience during a deploy.
SECRET_KEY_BASE Trap (Preventing Bulk Logouts)Devise relies heavily on Rails' SECRET_KEY_BASE to encrypt and sign your users' session cookies Tutorial: Deploy a Rails App Using Kamal.
SECRET_KEY_BASE changes during a deployment (or if it is dynamically generated inside the container on boot), every single logged-in user will be immediately logged out on your next deployment because their session cookies can no longer be decrypted.SECRET_KEY_BASE as an encrypted environment variable. Keep it persistent across deployments by defining it in your secrets file and referencing it in config/deploy.yml Tutorial: Deploy a Rails App Using Kamal:# config/deploy.yml
env:
secret:
- SECRET_KEY_BASE
(With the actual secret stored securely in .kamal/secrets so it never changes unless you manually rotate it).
Because Kamal runs a rolling deployment, Server 1 might be running your old container while Server 2 is running your new container Tutorial: Deploy a Rails App Using Kamal.
/dashboard) gets routed to Server 2 by the AWS Load Balancer, what happens? SECRET_KEY_BASE, the user will stay logged in. The session cookie is sent with every request, and Server 2 can decrypt it just fine.users TableIf you decide to enable a new Devise feature—like adding Trackable (tracking IP addresses and sign-in times) or Lockable (locking accounts after failed attempts)—you will need to generate a migration to add new columns to your users table.
null: true) or give them default values.kamal deploy to spin up the new code.null: false) if necessary.Devise sends a lot of critical emails (e.g., Password Reset, Account Confirmation, Two-Factor Authentication codes). By default, Rails sends these emails synchronously, which slows down the user's request.
To keep your app fast, you should configure Devise to send emails asynchronously via ActiveJob (using a background worker like Sidekiq). This drastically changes your Kamal setup:
web and worker roles in your config/deploy.yml Deploying Web Applications with Kamal and UpCloud.deploy.yml must also securely pass your SMTP or third-party mailing API keys (like Postmark or SendGrid):# config/deploy.yml
servers:
web:
- 192.168.1.10
worker:
hosts:
- 192.168.1.11
cmd: bundle exec sidekiq # Boots Sidekiq instead of Puma on this machine
env:
secret:
- POSTMARK_API_TOKEN # Injected into the containers to allow Devise to send emails
If your application scales to multiple VMs using Kamal, you become vulnerable to distributed brute force attacks on your Devise /users/sign_in endpoint.
Rack::Attack on each VM, an attacker can bypass the limit by sending login attempts across your different servers.What do you think about using open shift?
Choosing Red Hat OpenShift represents an absolute paradigm shift from the architecture we have designed so far. It brings into sharp focus a fundamental architectural tension:
The Tension: Operational Simplicity & Developer Velocity (Kamal) vs. Enterprise Standardization & Continuous Compliance (OpenShift).
- On one hand, the Kamal + Terraform approach maximizes developer speed and keeps hosting costs low by avoiding the "Kubernetes tax," bypassing heavy container orchestrators in favor of direct, agentless Docker management over SSH Tutorial: Deploy a Rails App Using Kamal.
- On the other hand, OpenShift (Red Hat's enterprise-packaged Kubernetes platform) provides a highly opinionated, secure, and automated ecosystem. It eliminates the manual integration of load balancers, registries, and ingress controllers, but demands massive operational overhead, high licensing fees, and a steep learning curve.
etcd database to continuously reconcile the cluster state in real-time. If a container crashes, an ingress route fails, or a secret changes, OpenShift heals itself continuously without needing an external terraform apply loop Terraform Tutorial for DevOps | Guide by Hostman. However, provisioning the underlying physical infrastructure and OpenShift clusters themselves still requires tools like Terraform or Ansible to bootstrap the virtual machines What is Ansible? A Tool to Automate Parts of Your Job.Using OpenShift is usually overkill for standard Ruby on Rails deployments unless you are operating in a highly regulated enterprise environment (such as banking, healthcare, or government) that mandates strict Security Context Constraints (SCCs), private-cloud isolation, and formal Red Hat support. For fast-moving product teams, pairing Terraform with Kamal provides 90% of the operational utility of a container workflow at a fraction of the cost and complexity Tutorial: Deploy a Rails App Using Kamal.
Follow-up Question:
What are the primary compliance or scaling requirements of your project (e.g., SOC2 compliance, multi-region failover, or strict network isolation) that prompted you to look at an enterprise Kubernetes solution like OpenShift?
Sign in to participate in this conversation.