CampaignBrain Database Architecture¶
Last Updated: December 4, 2025
Overview¶
The CampaignBrain system uses two different database systems for different purposes:
| Component | Database | Type | Purpose |
|---|---|---|---|
| Tenant Manager (cbtenant) | SQLite | OLTP | Manage tenants, deployments, config |
| Campaign App (cbapp) | DuckDB | OLAP | Store contacts, events, analytics |
Tenant Manager Database (SQLite)¶
File: db/manager.db
Engine: SQLite 3
Size: ~500KB (metadata only)
Why SQLite?¶
- Simple deployment management data
- Low write volume (CRUD operations)
- Strong ACID guarantees
- No complex queries
- Single file, easy backups
Tables (10)¶
tenants -- Tenant metadata (name, domain, ports)
tenant_configs -- App configuration per tenant
tenant_themes -- Branding/colors
users -- Tenant manager admins
deployments -- Deployment history
ai_analyses -- Theme analysis results
theme_suggestions -- Claude theme suggestions
logs -- Audit trail
port_allocations -- Port assignments
import_jobs -- List loader job tracking
Key Models¶
- Tenant: Name, slug, domain, ports, status
- TenantConfig: URLs, Ollama settings, custom env
- ImportJob: CSV upload tracking, mappings, results
Campaign App Database (DuckDB)¶
File: db/pocket.db
Engine: DuckDB 0.9+
Size: Variable (4.9GB for 83K contacts)
Why DuckDB?¶
- Analytical queries on voter/contact data
- Vector embeddings for semantic search
- Fast bulk imports (58K records/sec)
- Efficient aggregations for reporting
- Native CSV support - direct SQL queries on CSVs
- Columnar storage - optimized for analytics
Tables (10+)¶
person -- Contact/voter data
person_custom_field -- Dynamic custom fields
person_tag -- Contact categorization
tag -- Tag definitions (hierarchy)
custom_field -- Custom field definitions
event -- Canvass events, meetings
event_registration -- Event attendance
communication -- Emails, texts, calls
whip_status -- Legislator tracking
user -- App users (per tenant)
Person Table Schema (23 columns)¶
Core Fields:
id, first_name, last_name,
address1, address2, city, state, zip,
county, congressional_district,
home_phone, cell_phone, email
Campaign Fields:
Metadata:
Performance Issues (CRITICAL)¶
Missing Indexes on person table:
-- These DO NOT exist yet!
CREATE INDEX idx_person_email ON person(email);
CREATE INDEX idx_person_cell_phone ON person(cell_phone);
CREATE INDEX idx_person_home_phone ON person(home_phone);
Impact: Duplicate checking does full table scans - Small imports (<1K): Works fine - Large imports (83K): Timeout after hours
Tracked in: cbapp#2
Data Flow: List Loader¶
Option A: Via API (Current - Slow)¶
User uploads CSV
↓
Tenant Manager analyzes/maps fields
↓
Splits into 1K record batches
↓
For each batch:
HTTP POST to cbapp /api/persons/batch
↓
cbapp checks duplicates (3 queries × 1K = 3K table scans!)
↓
DuckDB INSERT (slow due to row-by-row)
Performance: ~30 seconds per 1K batch = 40 minutes for 83K
Option B: Direct DuckDB (Fast)¶
User uploads CSV
↓
Tenant Manager copies CSV to tenant server
↓
Direct DuckDB bulk import:
CREATE TEMP TABLE FROM read_csv_auto()
INSERT INTO person SELECT ... FROM temp
Performance: 1.4 seconds for 83K records
Recommendation¶
For large voter files (>10K), use Option B (direct DuckDB). For small lists (<1K) via UI, use Option A (API).
i360 Voter Data (Separate DuckDB)¶
File: /opt/campaignbrain/shared/data/i360.db (symlinked)
Engine: DuckDB
Size: ~2GB (shared across tenants)
Tables: voters (500K+ records)
Sharing Strategy¶
- Single i360 database file
- Symlinked into each tenant:
db/i360.db -> /opt/.../i360.db - Read-only access
- Filtered by district in queries
See: docs/I360_SHARED_DATABASE.md
Database Comparison¶
| Feature | SQLite (cbtenant) | DuckDB (cbapp) |
|---|---|---|
| Use Case | OLTP | OLAP |
| Write Pattern | Few writes | Bulk loads |
| Read Pattern | Simple lookups | Complex analytics |
| Query Type | Row-based | Column-based |
| Concurrent Writes | Limited | Better |
| Analytics | Slow | Fast |
| Vector Search | No | Yes (via extension) |
| CSV Import | Manual | Native read_csv_auto() |
Migration Scripts¶
Tenant Manager (SQLite)¶
Campaign App (DuckDB)¶
# Create base schema
python scripts/create_schema.py
# Add missing person columns (run on existing tenants)
python scripts/add_person_columns.py
# Add indexes (TODO - cbapp#2)
python scripts/add_person_indexes.py
Backup Strategy¶
Tenant Manager¶
Campaign App¶
# DuckDB - export to parquet for safety
duckdb pocket.db -c "COPY (SELECT * FROM person) TO 'person_backup.parquet'"
# Or use DuckDB checkpoint
duckdb pocket.db -c "CHECKPOINT"
Performance Tuning¶
DuckDB Settings¶
conn.execute("SET memory_limit='4GB'")
conn.execute("SET threads TO 4")
conn.execute("SET preserve_insertion_order=false") # Faster bulk inserts
Recommended Indexes (cbapp)¶
-- Lookup indexes
CREATE INDEX idx_person_email ON person(email);
CREATE INDEX idx_person_cell_phone ON person(cell_phone);
CREATE INDEX idx_person_home_phone ON person(home_phone);
-- Search indexes
CREATE INDEX idx_person_name ON person(last_name, first_name);
CREATE INDEX idx_person_zip ON person(zip);
CREATE INDEX idx_person_county ON person(county);
-- Query indexes
CREATE INDEX idx_person_tags ON person_tag(person_id);
CREATE INDEX idx_tag_persons ON person_tag(tag_id);
Known Issues¶
| Issue | Component | Impact | Tracking |
|---|---|---|---|
| No indexes on person | cbapp | Slow duplicate checks | cbapp#2 |
| API batch timeout | cbtenant | Large imports fail | cbtenant#15 |
| In-memory jobs | cbtenant | History lost on restart | ✅ Fixed in ce1a57e |
Import Performance Benchmarks¶
Via API (With Duplicate Checking)¶
| Records | Time | Speed | Result |
|---|---|---|---|
| 100 | 5s | 20/s | ✅ Success |
| 1,000 | 30s | 33/s | ✅ Success |
| 10,000 | 2min+ | timeout | ❌ Fail |
| 83,067 | hours | timeout | ❌ Fail |
Direct DuckDB (No Duplicate Checking)¶
| Records | Time | Speed | Result |
|---|---|---|---|
| 83,067 | 1.4s | 58,164/s | ✅ Success |
Conclusion: For large voter files, use direct DuckDB import.
Database Files by Component¶
campaignbrain/
├── db/
│ └── manager.db # SQLite - tenant metadata
mi20-clevenger/
├── db/
│ ├── pocket.db # DuckDB - contacts (4.9GB)
│ └── i360.db -> /opt/.../ # DuckDB - voter data (symlink)
ky04/
├── db/
│ ├── pocket.db # DuckDB - contacts
│ └── i360.db -> /opt/.../ # DuckDB - voter data (symlink)
Future Considerations¶
Option: PostgreSQL for cbapp¶
If analytical queries become too complex for DuckDB: - Keep DuckDB for bulk analytics/embeddings - Add PostgreSQL for transactional data (person, events) - Sync between them
Option: Separate Embeddings DB¶
pocket.db # Contacts/events (DuckDB)
embeddings.db # Vector search only (DuckDB)
i360.db # Voter data (DuckDB, read-only)
See Also:
- docs/I360_SHARED_DATABASE.md - Voter data sharing
- docs/LIST_LOADER_DESIGN.md - Import architecture
- cbapp#2 - Index issue