Data Refresh

There may be cases when data in the consumer database needs to be refreshed. This guide covers the process and considerations for performing a data refresh.

When to Perform a Data Refresh

Common scenarios requiring a data refresh:

Data corruption - Issues with the consumer database
Schema changes - Significant upstream schema modifications
Testing - Setting up a fresh test environment
Compliance - Data retention policy requirements
Upgrades - Major consumer version upgrades

Data Refresh Process

Step 1: Stop the Consumer

Ensure the consumer is stopped before making changes.

# Docker
docker compose down

# Kubernetes
kubectl scale deployment solifi-consumer --replicas=0

# Standalone
# Stop the Java process

Step 2: Clean the Database

You have two options:

Option A: Delete the entire database (Full refresh)

-- Create a new database
DROP DATABASE solifi_data;
CREATE DATABASE solifi_data;

Option B: Delete specific tables (Partial refresh)

-- Delete data tables
DROP TABLE IF EXISTS table_name;
DROP TABLE IF EXISTS table_name_audit;

-- Or truncate (faster but keeps structure)
TRUNCATE TABLE table_name;
TRUNCATE TABLE table_name_audit;

Step 3: Change the Consumer Group ID

Update the group-id in application.yml to force the consumer to read from the beginning:

spring:
  kafka:
    consumer:
      group-id: mycompany-consumer-refresh-20241221  # New unique ID

Why Change group-id?

Kafka tracks which messages have been read per consumer group. Using a new group ID forces the consumer to start reading from the earliest available offset.

Step 4: Restart the Consumer

# Docker
docker compose up -d

# Kubernetes
kubectl scale deployment solifi-consumer --replicas=1

# Standalone
java -jar solifi-consumer-<version>.jar

Preserving Audit History

Keeping Audit History

If you want to preserve audit history during a data refresh, do not delete the _audit tables.

Solifi compresses data after 7 days, so you may lose audit history of older data during a refresh. If you want to keep historical audit records:

Keep the existing _audit tables intact
Only drop/truncate the main data tables
Change the group-id
Restart the consumer

The consumer will append new audit events to the existing audit tables. You might see some duplicate records, but the value in column lp_db_insert_user will be different (reflecting the new group-id).

Example: Preserving Audit Tables

-- Only drop main tables, keep audit tables
DROP TABLE IF EXISTS cs_master_nf;
DROP TABLE IF EXISTS ls_master_nf;
-- DO NOT drop: cs_master_nf_audit, ls_master_nf_audit

Using Initial Load Mode for Large Refreshes

For large data volumes (>5 million messages), consider using Initial Load Mode instead of standard streaming:

Clean the database completely
Configure initial-load mode
Run the dry run
Execute the initial load
Switch back to streaming mode

This is significantly faster than a standard data refresh.

Post-Refresh Verification

After the refresh completes:

1. Check Table Counts

SELECT 
  'cs_master_nf' as table_name, 
  COUNT(*) as row_count 
FROM cs_master_nf
UNION ALL
SELECT 
  'ls_master_nf' as table_name, 
  COUNT(*) as row_count 
FROM ls_master_nf;

2. Verify Consumer Health

curl http://localhost:8080/actuator/health

3. Check Consumer Lag

Monitor that consumer lag is decreasing and eventually reaches zero.

4. Review Error Log

SELECT COUNT(*) FROM error_log
WHERE created_at > DATEADD(hour, -1, GETDATE());

Partial Topic Refresh

To refresh only specific topics without affecting others:

Stop the consumer
Drop only the tables for the target topics
Update application.yml to consume only those topics temporarily
Use a new group-id specific to this refresh
Run the consumer until caught up
Restore original configuration

Refresh Checklist

Next Steps

Learn about initial load mode for faster refreshes
Configure health monitoring
Review the sizing guide

When to Perform a Data Refresh
Data Refresh Process
Preserving Audit History
- Example: Preserving Audit Tables
Using Initial Load Mode for Large Refreshes
Post-Refresh Verification
Partial Topic Refresh
Refresh Checklist
Next Steps

When to Perform a Data Refresh​

Data Refresh Process​

Step 1: Stop the Consumer​

Step 2: Clean the Database​

Step 3: Change the Consumer Group ID​

Step 4: Restart the Consumer​

Preserving Audit History​

Example: Preserving Audit Tables​

Using Initial Load Mode for Large Refreshes​

Post-Refresh Verification​

1. Check Table Counts​

2. Verify Consumer Health​

3. Check Consumer Lag​

4. Review Error Log​

Partial Topic Refresh​

Refresh Checklist​

Next Steps​