Skip to content

CampaignBrain Database Architecture

Last Updated: December 4, 2025


Overview

The CampaignBrain system uses two different database systems for different purposes:

Component Database Type Purpose
Tenant Manager (cbtenant) SQLite OLTP Manage tenants, deployments, config
Campaign App (cbapp) DuckDB OLAP Store contacts, events, analytics

Tenant Manager Database (SQLite)

File: db/manager.db Engine: SQLite 3 Size: ~500KB (metadata only)

Why SQLite?

  • Simple deployment management data
  • Low write volume (CRUD operations)
  • Strong ACID guarantees
  • No complex queries
  • Single file, easy backups

Tables (10)

tenants              -- Tenant metadata (name, domain, ports)
tenant_configs       -- App configuration per tenant
tenant_themes        -- Branding/colors
users                -- Tenant manager admins
deployments          -- Deployment history
ai_analyses          -- Theme analysis results
theme_suggestions    -- Claude theme suggestions
logs                 -- Audit trail
port_allocations     -- Port assignments
import_jobs          -- List loader job tracking

Key Models

  • Tenant: Name, slug, domain, ports, status
  • TenantConfig: URLs, Ollama settings, custom env
  • ImportJob: CSV upload tracking, mappings, results

Campaign App Database (DuckDB)

File: db/pocket.db Engine: DuckDB 0.9+ Size: Variable (4.9GB for 83K contacts)

Why DuckDB?

  • Analytical queries on voter/contact data
  • Vector embeddings for semantic search
  • Fast bulk imports (58K records/sec)
  • Efficient aggregations for reporting
  • Native CSV support - direct SQL queries on CSVs
  • Columnar storage - optimized for analytics

Tables (10+)

person                    -- Contact/voter data
person_custom_field       -- Dynamic custom fields
person_tag                -- Contact categorization
tag                       -- Tag definitions (hierarchy)
custom_field              -- Custom field definitions
event                     -- Canvass events, meetings
event_registration        -- Event attendance
communication             -- Emails, texts, calls
whip_status               -- Legislator tracking
user                      -- App users (per tenant)

Person Table Schema (23 columns)

Core Fields:

id, first_name, last_name,
address1, address2, city, state, zip,
county, congressional_district,
home_phone, cell_phone, email

Campaign Fields:

notes, precinct, precinct_number,
precinct_status, precinct_role, whip_status

Metadata:

created_at, updated_at, created_by, updated_by

Performance Issues (CRITICAL)

Missing Indexes on person table:

-- These DO NOT exist yet!
CREATE INDEX idx_person_email ON person(email);
CREATE INDEX idx_person_cell_phone ON person(cell_phone);
CREATE INDEX idx_person_home_phone ON person(home_phone);

Impact: Duplicate checking does full table scans - Small imports (<1K): Works fine - Large imports (83K): Timeout after hours

Tracked in: cbapp#2


Data Flow: List Loader

Option A: Via API (Current - Slow)

User uploads CSV
Tenant Manager analyzes/maps fields
Splits into 1K record batches
For each batch:
    HTTP POST to cbapp /api/persons/batch
    cbapp checks duplicates (3 queries × 1K = 3K table scans!)
    DuckDB INSERT (slow due to row-by-row)

Performance: ~30 seconds per 1K batch = 40 minutes for 83K

Option B: Direct DuckDB (Fast)

User uploads CSV
Tenant Manager copies CSV to tenant server
Direct DuckDB bulk import:
    CREATE TEMP TABLE FROM read_csv_auto()
    INSERT INTO person SELECT ... FROM temp

Performance: 1.4 seconds for 83K records

Recommendation

For large voter files (>10K), use Option B (direct DuckDB). For small lists (<1K) via UI, use Option A (API).


i360 Voter Data (Separate DuckDB)

File: /opt/campaignbrain/shared/data/i360.db (symlinked) Engine: DuckDB Size: ~2GB (shared across tenants) Tables: voters (500K+ records)

Sharing Strategy

  • Single i360 database file
  • Symlinked into each tenant: db/i360.db -> /opt/.../i360.db
  • Read-only access
  • Filtered by district in queries

See: docs/I360_SHARED_DATABASE.md


Database Comparison

Feature SQLite (cbtenant) DuckDB (cbapp)
Use Case OLTP OLAP
Write Pattern Few writes Bulk loads
Read Pattern Simple lookups Complex analytics
Query Type Row-based Column-based
Concurrent Writes Limited Better
Analytics Slow Fast
Vector Search No Yes (via extension)
CSV Import Manual Native read_csv_auto()

Migration Scripts

Tenant Manager (SQLite)

# Create schema
python db/create_schema.py

# Adds all tables automatically

Campaign App (DuckDB)

# Create base schema
python scripts/create_schema.py

# Add missing person columns (run on existing tenants)
python scripts/add_person_columns.py

# Add indexes (TODO - cbapp#2)
python scripts/add_person_indexes.py

Backup Strategy

Tenant Manager

# SQLite - simple file copy
cp db/manager.db db/manager.db.backup

Campaign App

# DuckDB - export to parquet for safety
duckdb pocket.db -c "COPY (SELECT * FROM person) TO 'person_backup.parquet'"

# Or use DuckDB checkpoint
duckdb pocket.db -c "CHECKPOINT"

Performance Tuning

DuckDB Settings

conn.execute("SET memory_limit='4GB'")
conn.execute("SET threads TO 4")
conn.execute("SET preserve_insertion_order=false")  # Faster bulk inserts
-- Lookup indexes
CREATE INDEX idx_person_email ON person(email);
CREATE INDEX idx_person_cell_phone ON person(cell_phone);
CREATE INDEX idx_person_home_phone ON person(home_phone);

-- Search indexes
CREATE INDEX idx_person_name ON person(last_name, first_name);
CREATE INDEX idx_person_zip ON person(zip);
CREATE INDEX idx_person_county ON person(county);

-- Query indexes
CREATE INDEX idx_person_tags ON person_tag(person_id);
CREATE INDEX idx_tag_persons ON person_tag(tag_id);

Known Issues

Issue Component Impact Tracking
No indexes on person cbapp Slow duplicate checks cbapp#2
API batch timeout cbtenant Large imports fail cbtenant#15
In-memory jobs cbtenant History lost on restart ✅ Fixed in ce1a57e

Import Performance Benchmarks

Via API (With Duplicate Checking)

Records Time Speed Result
100 5s 20/s ✅ Success
1,000 30s 33/s ✅ Success
10,000 2min+ timeout ❌ Fail
83,067 hours timeout ❌ Fail

Direct DuckDB (No Duplicate Checking)

Records Time Speed Result
83,067 1.4s 58,164/s ✅ Success

Conclusion: For large voter files, use direct DuckDB import.


Database Files by Component

campaignbrain/
├── db/
│   └── manager.db              # SQLite - tenant metadata

mi20-clevenger/
├── db/
│   ├── pocket.db               # DuckDB - contacts (4.9GB)
│   └── i360.db -> /opt/.../    # DuckDB - voter data (symlink)

ky04/
├── db/
│   ├── pocket.db               # DuckDB - contacts
│   └── i360.db -> /opt/.../    # DuckDB - voter data (symlink)

Future Considerations

Option: PostgreSQL for cbapp

If analytical queries become too complex for DuckDB: - Keep DuckDB for bulk analytics/embeddings - Add PostgreSQL for transactional data (person, events) - Sync between them

Option: Separate Embeddings DB

pocket.db      # Contacts/events (DuckDB)
embeddings.db  # Vector search only (DuckDB)
i360.db        # Voter data (DuckDB, read-only)

See Also: - docs/I360_SHARED_DATABASE.md - Voter data sharing - docs/LIST_LOADER_DESIGN.md - Import architecture - cbapp#2 - Index issue