Self-hosted Prefect deployments require database maintenance to ensure optimal performance and manage disk usage. This guide provides monitoring queries and maintenance strategies for PostgreSQL databases.
This guide is for advanced users managing production deployments. Always test maintenance operations in a non-production environment first, if possible.Exact numbers included in this guide will vary based on your workload and installation.

Quick reference

Daily tasks:
  • Check disk space and table sizes
  • Monitor bloat levels (> 50% requires action)
  • Run retention policies for old flow runs
Weekly tasks:
  • Review autovacuum performance
  • Check index usage and bloat
  • Analyze high-traffic tables
Red flags requiring immediate action:
  • Disk usage > 80%
  • Table bloat > 100%
  • Connection count approaching limit
  • Autovacuum hasn’t run in 24+ hours

Database growth monitoring

Prefect stores entities like events, flow runs, task runs, and logs that accumulate over time. Monitor your database regularly to understand growth patterns specific to your usage.

Check table sizes

-- Total database size
SELECT pg_size_pretty(pg_database_size('prefect')) AS database_size;

-- Table sizes with row counts
SELECT 
    schemaname,
    relname AS tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||relname)) AS total_size,
    to_char(n_live_tup, 'FM999,999,999') AS row_count
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||relname) DESC
LIMIT 20;

Monitor disk space

Track overall disk usage to prevent outages:
-- Check database disk usage
SELECT 
    current_setting('data_directory') AS data_directory,
    pg_size_pretty(pg_database_size('prefect')) AS database_size,
    pg_size_pretty(pg_total_relation_size('public.events')) AS events_table_size,
    pg_size_pretty(pg_total_relation_size('public.log')) AS log_table_size;

-- Check available disk space (requires pg_stat_disk extension or shell access)
-- Run from shell: df -h /path/to/postgresql/data
Common large tables in Prefect databases:
  • events - Automatically generated for all state changes (often the largest table)
  • log - Flow and task run logs
  • flow_run and task_run - Execution records
  • flow_run_state and task_run_state - State history

Monitor table bloat

PostgreSQL tables can accumulate “dead tuples” from updates and deletes. Monitor bloat percentage to identify tables needing maintenance:
SELECT
    schemaname,
    relname AS tablename,
    n_live_tup AS live_tuples,
    n_dead_tup AS dead_tuples,
    CASE WHEN n_live_tup > 0 
        THEN round(100.0 * n_dead_tup / n_live_tup, 2)
        ELSE 0
    END AS bloat_percent,
    last_vacuum,
    last_autovacuum
FROM pg_stat_user_tables
WHERE schemaname = 'public' 
    AND n_dead_tup > 1000
ORDER BY bloat_percent DESC;

Monitor index bloat

Indexes can also bloat and impact performance:
-- Check index sizes and bloat
SELECT 
    schemaname,
    relname AS tablename,
    indexrelname AS indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
    idx_scan AS index_scans,
    idx_tup_read AS tuples_read,
    idx_tup_fetch AS tuples_fetched
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC
LIMIT 20;

PostgreSQL VACUUM

VACUUM reclaims storage occupied by dead tuples. While PostgreSQL runs autovacuum automatically, you may need manual intervention for heavily updated tables.

Manual VACUUM

For tables with high bloat percentages:
-- Standard VACUUM (doesn't lock table)
VACUUM ANALYZE flow_run;
VACUUM ANALYZE task_run;
VACUUM ANALYZE log;

-- VACUUM FULL (rebuilds table, requires exclusive lock)
-- WARNING: This COMPLETELY LOCKS the table - no reads or writes!
-- Can take HOURS on large tables. Only use as last resort.
VACUUM FULL flow_run;

-- Better alternative: pg_repack (if installed)
-- Rebuilds tables online without blocking
-- pg_repack -t flow_run -d prefect

Monitor autovacuum

Check if autovacuum is keeping up with your workload:
-- Show autovacuum settings
SHOW autovacuum;
SHOW autovacuum_vacuum_scale_factor;
SHOW autovacuum_vacuum_threshold;

-- Check when tables were last vacuumed
SELECT 
    schemaname,
    relname AS tablename,
    last_vacuum,
    last_autovacuum,
    vacuum_count,
    autovacuum_count
FROM pg_stat_user_tables
WHERE schemaname = 'public'
ORDER BY last_autovacuum NULLS FIRST;

Tune autovacuum for Prefect workloads

Depending on your workload, your write patterns may require more aggressive autovacuum settings than defaults:
-- For high-volume events table (INSERT/DELETE heavy)
ALTER TABLE events SET (
    autovacuum_vacuum_scale_factor = 0.05,  -- Default is 0.2
    autovacuum_vacuum_threshold = 1000,
    autovacuum_analyze_scale_factor = 0.02  -- Keep stats current
);

-- For state tables (INSERT-heavy)
ALTER TABLE flow_run_state SET (
    autovacuum_vacuum_scale_factor = 0.1,
    autovacuum_analyze_scale_factor = 0.05
);

-- For frequently updated tables
ALTER TABLE flow_run SET (
    autovacuum_vacuum_scale_factor = 0.1,
    autovacuum_vacuum_threshold = 500
);

When to take action

Bloat thresholds:
  • < 20% bloat: Normal, autovacuum should handle
  • 20-50% bloat: Monitor closely, consider manual VACUUM
  • > 50% bloat: Manual VACUUM recommended
  • > 100% bloat: Significant performance impact, urgent action needed
Warning signs:
  • Autovacuum hasn’t run in > 24 hours on active tables
  • Query performance degrading over time
  • Disk space usage growing faster than data volume

Data retention

Implement data retention policies to manage database growth. The following example shows a Prefect flow that safely deletes old flow runs using the Prefect API:
Using the Prefect API ensures proper cleanup of all related data, including logs and artifacts. The API handles cascade deletions and triggers necessary background tasks.
import asyncio
from datetime import datetime, timedelta, timezone
from prefect import flow, task, get_run_logger
from prefect.client.orchestration import get_client
from prefect.client.schemas.filters import FlowRunFilter, FlowRunFilterState, FlowRunFilterStateType, FlowRunFilterStartTime
from prefect.client.schemas.objects import StateType

@task
async def delete_old_flow_runs(
    days_to_keep: int = 30,
    batch_size: int = 100
):
    """Delete completed flow runs older than specified days."""
    logger = get_run_logger()
    
    async with get_client() as client:
        cutoff = datetime.now(timezone.utc) - timedelta(days=days_to_keep)
        
        # Create filter for old completed flow runs
        # Note: Using start_time because created time filtering is not available
        flow_run_filter = FlowRunFilter(
            start_time=FlowRunFilterStartTime(before_=cutoff),
            state=FlowRunFilterState(
                type=FlowRunFilterStateType(
                    any_=[StateType.COMPLETED, StateType.FAILED, StateType.CANCELLED]
                )
            )
        )
        
        # Get flow runs to delete
        flow_runs = await client.read_flow_runs(
            flow_run_filter=flow_run_filter,
            limit=batch_size
        )
        
        deleted_total = 0
        
        while flow_runs:
            batch_deleted = 0
            failed_deletes = []
            
            # Delete each flow run through the API
            for flow_run in flow_runs:
                try:
                    await client.delete_flow_run(flow_run.id)
                    deleted_total += 1
                    batch_deleted += 1
                except Exception as e:
                    logger.warning(f"Failed to delete flow run {flow_run.id}: {e}")
                    failed_deletes.append(flow_run.id)
                    
                # Rate limiting - adjust based on your API capacity
                if batch_deleted % 10 == 0:
                    await asyncio.sleep(0.5)
                    
            logger.info(f"Deleted {batch_deleted}/{len(flow_runs)} flow runs (total: {deleted_total})")
            if failed_deletes:
                logger.warning(f"Failed to delete {len(failed_deletes)} flow runs")
            
            # Get next batch
            flow_runs = await client.read_flow_runs(
                flow_run_filter=flow_run_filter,
                limit=batch_size
            )
            
            # Delay between batches to avoid overwhelming the API
            await asyncio.sleep(1.0)
        
        logger.info(f"Retention complete. Total deleted: {deleted_total}")

@flow(name="database-retention")
async def retention_flow():
    """Run database retention tasks."""
    await delete_old_flow_runs(
        days_to_keep=30,
        batch_size=100
    )

Direct SQL approach

In some cases, you may need to use direct SQL for performance reasons or when the API is unavailable. Be aware that direct deletion bypasses application-level cascade logic:
# Direct SQL only deletes what's defined by database foreign keys
# Logs and artifacts may be orphaned without proper cleanup
async with asyncpg.connect(connection_url) as conn:
    await conn.execute("""
        DELETE FROM flow_run 
        WHERE created < $1 
        AND state_type IN ('COMPLETED', 'FAILED', 'CANCELLED')
    """, cutoff)

Important considerations

  1. Filtering limitation: The current API filters by start_time (when the flow run began execution), not created time (when the flow run was created in the database). This means flows that were created but never started won’t be deleted.
  2. Test first: Run with SELECT instead of DELETE to preview what will be removed
  3. Start conservative: Begin with longer retention periods and adjust based on needs
  4. Monitor performance: Large deletes can impact database performance
  5. Backup: Always backup before major cleanup operations

Event retention

Events are automatically generated for all state changes in Prefect and can quickly become the largest table in your database. Prefect includes built-in event retention that automatically removes old events.

Configure event retention

The default retention period is 7 days. For high-volume deployments, consider reducing this:
# Set retention to 2 days (as environment variable)
export PREFECT_EVENTS_RETENTION_PERIOD="2d"

# Or in your prefect configuration
prefect config set PREFECT_EVENTS_RETENTION_PERIOD="2d"

Check event table size

Monitor your event table growth:
-- Event table size and row count
SELECT 
    pg_size_pretty(pg_total_relation_size('public.events')) AS total_size,
    to_char(count(*), 'FM999,999,999') AS row_count,
    min(occurred) AS oldest_event,
    max(occurred) AS newest_event
FROM events;
Events are used for automations and triggers. Ensure your retention period keeps events long enough for your automation needs.

Connection monitoring

Monitor connection usage to prevent exhaustion:
SELECT 
    count(*) AS total_connections,
    count(*) FILTER (WHERE state = 'active') AS active,
    count(*) FILTER (WHERE state = 'idle') AS idle,
    (SELECT setting::int FROM pg_settings WHERE name = 'max_connections') AS max_connections
FROM pg_stat_activity;

Automating database maintenance

Schedule maintenance tasks

Schedule the retention flow to run automatically. See how to create deployments for creating scheduled deployments. For example, you could run the retention flow daily at 2 AM to clean up old flow runs.
  • Hourly: Monitor disk space and connection count
  • Daily: Run retention policies, check bloat levels
  • Weekly: Analyze tables, review autovacuum performance
  • Monthly: REINDEX heavily used indexes, full database backup

Troubleshooting common issues

”VACUUM is taking forever”

  • Check for long-running transactions blocking VACUUM:
    SELECT pid, age(clock_timestamp(), query_start), usename, query 
    FROM pg_stat_activity 
    WHERE state <> 'idle' AND query NOT ILIKE '%vacuum%' 
    ORDER BY age DESC;
    
  • Consider using pg_repack instead of VACUUM FULL
  • Run during low-traffic periods

”Database is growing despite retention policies”

  • Verify event retention is configured: prefect config view | grep EVENTS_RETENTION
  • Check if autovacuum is running on the events table
  • Ensure retention flow is actually executing (check flow run history)

“Queries are getting slower over time”

  • Update table statistics: ANALYZE;
  • Check for missing indexes using pg_stat_user_tables
  • Review query plans with EXPLAIN ANALYZE

”Connection limit reached”

  • Implement connection pooling immediately
  • Check for connection leaks: connections in ‘idle’ state for hours
  • Reduce Prefect worker/agent connection counts

Further reading