Skip to main content
Back to the field guide

A field guide to the /flux-migrate skill

Zero-Downtime Database Migrations with AI

Most AI tools write migrations that assume the database is offline. /flux-migrate produces reversible steps safe under live traffic, with rollback plans for every step.

Flux · Data11 min readApril 24, 2026

There is a particular shape of bug that costs a startup a Saturday morning, and sometimes a Saturday afternoon and a postmortem and a stack of credibility with the team that has to sit through it. The shape is: somebody ran a migration on the production database, the migration locked a table, the lock cascaded into the application, the application returned 500s for the duration of the lock, and the duration of the lock turned out to be much longer than the migration plan promised. The damage from a migration that went wrong is rarely the schema change itself. It is the fact that the schema change was treated as a quick task at the end of a feature, planned in the local environment where the table had a hundred rows, and dispatched to production where the same table had fifty million.

Production databases under live traffic are a different problem from local databases at rest. The right answer to a schema change has to account for concurrent reads, concurrent writes, the lock semantics of the specific engine, the size of the affected table, the index strategy, and the rollback path. Mainstream AI coding tools do not account for any of this. Ask Cursor or ChatGPT to add a NOT NULL column to a fifty-million-row table and you get back the one-line statement. The statement is correct. It will also lock the table for several minutes and take production down. The /flux-migrate skill exists to write the version of that migration that does not take production down: the multi-step, reversible plan that adds the column nullable, backfills in batches, validates the constraint, and only then promotes it to NOT NULL, with a rollback path at every step.

Why generalist AI gets migrations wrong

Generalist coding assistants treat a migration as a single statement. That mental model is fine for the local development database where rows are scarce, traffic is zero, and a five-second lock is invisible. It is the wrong model for production. The mainstream tools also lack the context that determines safety: they do not know how big the table is, they do not know the engine version (which determines whether the migration can be done concurrently), they do not know whether the application reads from a read replica that will lag during the migration, and they do not know whether the column being added is referenced by a hot path that will start failing the moment the column is missing on an old replica. Without that context, the model writes the cleanest single statement it can. The single statement is also the dangerous one.

Cursor and Copilot are even further from the right answer because they operate at the line level. The line they suggest is the syntactically valid one. Whether the line will run safely in production is not a property the editor can see. Some teams paper over this with linters and migration review tooling, and that helps for the obvious cases (no DROP COLUMN on a hot path, no synchronous index creation on a giant table). The non-obvious cases, the ones that cost the Saturday morning, slip through linters because the rule is not syntactic. It is operational: it depends on table size, traffic shape, replication topology, and the coordination between application deploys and migration steps. That is the kind of judgment a senior data engineer applies, and it is exactly what /flux-migrate is built to encode.

What zero-downtime migration actually requires

A zero-downtime migration is a sequence of small, reversible steps. Each step has to be safe for the application running both before and after the step (because the deploy is not instantaneous). Each step has to be safe under the lock semantics of the database engine. Each step has to have a rollback path that does not require its own migration. And the sequence has to be coordinated with the application code: a column rename, for example, is not one operation, it is six (add new column, dual-write, backfill, switch reads, drop old column, drop dual-write). Compress those six into one and you get a few seconds of dropped writes during the deploy window.

The other thing zero-downtime migrations require is patience. A correct migration on a large table is rarely a single deploy. It is a sequence spread across days, sometimes weeks, with batched backfills that run in the background and validation steps that confirm the data is consistent before the cutover. Most teams skip the patience because the discipline is annoying, and most teams pay for that skip when an incident postmortem exposes the corner case the rushed migration missed. The /flux-migrate skill is opinionated about this: it produces the multi-step plan even when the requester would prefer a one-liner, and it lists explicitly which steps must be in separate deploys.

How /flux-migrate works

Step one: read the current schema and table sizes

Before writing any SQL, /flux-migrate reads the current schema, the size of the relevant tables, the existing indexes, and any foreign key relationships that the migration will touch. The size matters because the right migration for a thousand-row table is different from the right migration for a fifty-million-row table; the index list matters because adding an index concurrently is the only safe option on busy tables, and the foreign keys matter because they determine the order of the steps. This read step is the difference between a migration that runs on the assumption of small data and a migration that survives contact with the production volume.

Step two: produce a step-by-step plan

The output of /flux-migrate is not a single SQL file. It is a plan with numbered steps, each with the SQL to run, the lock behavior expected, the rollback SQL if the step fails or has to be reverted, and the checkpoint that confirms the step is complete before the next begins. For a column rename, the plan looks like: 1) add the new column nullable, 2) deploy the application code that dual-writes to both columns, 3) backfill the new column from the old in batches of N rows, 4) validate the new column matches the old for all rows, 5) deploy the application code that reads from the new column, 6) drop the old column. Each numbered step is its own deploy or its own background task; the migration is complete when the final step lands.

Step three: rollback for every step

Every step in the plan has an explicit rollback path. "Add column nullable" rolls back to "drop column." "Deploy dual-write" rolls back to "deploy without dual-write." "Backfill in batches" rolls back to "truncate the new column." The rollback paths are not afterthoughts; they are part of the plan, surfaced before any step runs, so the operator knows what each rollback costs and what state the system ends up in if the migration is aborted halfway through. This is the discipline that separates a plan that survives an unexpected alert from a plan that turns the alert into a postmortem.

The most common migration failure is not the SQL. It is the application deploy that ships before the migration completes, or after the migration has reverted, leaving the code referencing columns that do not exist. The /flux-migrate plan lists which deploys are safe in which order, so the operator can sequence them correctly.

Step four: the engine-specific behavior

Different databases have different lock semantics, and the right migration depends on the engine. Postgres allows concurrent index creation (CREATE INDEX CONCURRENTLY), which avoids the long lock, but introduces its own caveats (the index can be left in an invalid state if the operation fails). MySQL has online DDL with different rules per engine. SQLite has very limited online migration support. The /flux-migrate plan picks the engine-appropriate strategy and surfaces the caveats: "this CREATE INDEX CONCURRENTLY can leave the index INVALID if interrupted; the rollback is to DROP INDEX and retry," or "this MySQL ALTER TABLE will copy the table; for a fifty-million-row table this is a hours-long operation." The engine-specific knowledge is encoded in the skill so the operator does not have to remember it.

Tonone's /flux-migrate skill writes zero-downtime database migrations as multi-step reversible plans with rollback at every step, engine-specific lock awareness, and explicit deploy sequencing.

When to use /flux-migrate, and when not to

/flux-migrate is the right call any time the database has live traffic and the change cannot be made in a maintenance window. That covers most production systems and almost all customer-facing services. It is also the right call when the migration is small but touches a hot path: a small table with high write volume can deadlock a one-line migration just as effectively as a large table with low write volume, because the issue is the lock contention, not the row count.

Skip the skill when the database is not live (greenfield projects, local development, ETL warehouses with off-hours load windows) or when downtime is genuinely acceptable (small internal tools where a planned ten-minute window is part of the deploy schedule). For those cases, /flux-schema produces a simpler initial schema design without the rollback overhead. The discipline of zero-downtime is valuable, but it is not free; reach for it when production traffic is the constraint.

CapabilityTononeGeneralist chatbotCursor / Copilot
Reads table size before writing migrationYes, structured read of schema, sizes, indexesNo, works from prompt onlyNo, line-level only
Multi-step reversible planYes, numbered steps with rollback per stepSingle SQL statementSingle SQL statement
Engine-specific lock awarenessYes, picks Postgres CONCURRENTLY, MySQL online DDL, etc.Generic SQL, no lock contextGeneric SQL, no lock context
Deploy sequencing surfacedYes, lists which steps are separate deploysNo deploy guidanceNo deploy guidance
Backfill batching for large tablesYes, batched with progress checkpointsSingle UPDATE, locks the tableNot applicable

A worked example: renaming a column on a large table

Consider the brief: rename users.full_name to users.display_name. The table has eighty million rows and is read on every authenticated request. Run /flux-migrate against the schema and the output looks like the following plan.

sql
-- Step 1: add the new column (deploy A)
ALTER TABLE users ADD COLUMN display_name TEXT;
-- Lock: AccessExclusiveLock for milliseconds. Safe.
-- Rollback: ALTER TABLE users DROP COLUMN display_name;

-- Step 2: deploy application code that dual-writes
-- (no SQL; deploy A+1)
-- Rollback: deploy back to A

-- Step 3: backfill in batches of 10,000
-- Run repeatedly until 0 rows updated.
UPDATE users
SET display_name = full_name
WHERE display_name IS NULL
AND id IN (
  SELECT id FROM users
  WHERE display_name IS NULL
  ORDER BY id LIMIT 10000
);
-- Lock: row-level only. Safe under traffic.
-- Rollback: UPDATE users SET display_name = NULL;
-- (only safe before step 5 deploys)

-- Step 4: validate parity
SELECT COUNT(*) FROM users
WHERE display_name IS DISTINCT FROM full_name;
-- Expected: 0. If non-zero, investigate before step 5.

-- Step 5: deploy application code that reads display_name
-- (no SQL; deploy A+2)
-- Rollback: deploy back to A+1

-- Step 6: drop the old column (deploy A+3, or scheduled)
ALTER TABLE users DROP COLUMN full_name;
-- Lock: AccessExclusiveLock for milliseconds. Safe.
-- Rollback: requires a fresh backfill from logs.
-- This step is irreversible without restoring data.

Six steps over three deploys, with checkpoints between each. The migration is complete when step 6 lands; if anything goes wrong before step 5, the rollback is a deploy revert. After step 6 the rename is permanent and irreversible without a data restore. The plan surfaces that fact explicitly so the operator knows the point of no return.

/flux-migrate is most useful in combination with the schema design and health skills. /flux-schema produces the initial schema for a new system; /flux-migrate evolves it under live traffic. /flux-health checks for the kind of data quality issues that a migration can expose, and /flux-recon maps the database before a migration is planned in an unfamiliar codebase.

Install

/flux-migrate ships with the Flux agent in the Tonone for Claude Code package. Install Tonone, invoke /flux-migrate from any Claude Code session inside the repository that contains your migrations, and the skill produces a step-by-step plan against the live schema.

1. Add to marketplace

$ claude plugin marketplace add tonone-ai/tonone

2. Install Flux

$ claude plugin install flux@tonone-ai

The discipline of zero-downtime migration is what separates a Saturday postmortem from a routine deploy. The skill is built to make that discipline cheap enough to apply on every schema change, including the small ones that turn out not to be small under production load.

Frequently asked questions

What does /flux-migrate do?
It writes zero-downtime database migrations as multi-step reversible plans. Each step has rollback SQL, engine-specific lock awareness, and explicit deploy sequencing so the migration is safe under live traffic.
How is /flux-migrate different from a generalist AI writing SQL?
A generalist writes a single SQL statement. /flux-migrate reads the current schema and table sizes, then produces a numbered plan with rollback per step and explicit deploy ordering. The plan accounts for lock semantics that depend on the database engine and table size.
When should I use /flux-migrate?
Use it whenever a schema change runs against a live production database, or when a small change touches a hot path with high write contention. Skip it for greenfield databases or genuine maintenance windows where downtime is acceptable.
Does /flux-migrate run the migration?
No. The skill produces the plan and the SQL. The migration is run by your existing migration tooling (Alembic, Knex, Prisma, Flyway, etc.) or by hand if that is your workflow.
What databases does /flux-migrate support?
The skill is engine-aware: it picks Postgres CONCURRENTLY for index creation, MySQL online DDL when available, and surfaces SQLite limitations explicitly. The plan adapts to the engine in your repository.
How do I install /flux-migrate?
Install Tonone for Claude Code via the get-started guide at tonone.ai/get-started. /flux-migrate ships with the Flux agent and is invoked as a slash command in any Claude Code session. Tonone is free and MIT-licensed.
Is /flux-migrate free?
Yes. The skill is part of Tonone, which is MIT-licensed. The only cost is Claude Code token usage during the work.
What is the rollback story for a /flux-migrate plan?
Every step in the plan has an explicit rollback SQL or deploy revert. The plan surfaces irreversible steps (typically the final drop of an old column) so the operator knows the point of no return.

Pairs well with