Back to Blog
Data Engineering November 15, 2024 8 min read

PostgreSQL as a Feature Store: Design Patterns for ML Pipelines

How to use PostgreSQL effectively as a feature store — materialized views for aggregations, partitioning for time series, and indexing strategies for ML queries.

Feature Store Schema

CREATE TABLE features (
    entity_id   BIGINT,
    feature_name VARCHAR(100),
    feature_value FLOAT,
    computed_at TIMESTAMPTZ,
    PRIMARY KEY (entity_id, feature_name, computed_at)
) PARTITION BY RANGE (computed_at);

Materialized Views for Aggregations

CREATE MATERIALIZED VIEW user_stats AS
SELECT 
    user_id,
    COUNT(*) as txn_count,
    AVG(amount) as avg_amount,
    STDDEV(amount) as std_amount,
    MAX(txn_at) as last_txn_at
FROM transactions
GROUP BY user_id;

CREATE UNIQUE INDEX ON user_stats(user_id);
REFRESH MATERIALIZED VIEW CONCURRENTLY user_stats;

Refresh Strategy

Schedule via pg_cron every 15 minutes for near-real-time features.

PostgreSQLFeature StoreML PipelineSQLData Engineering
O

Ossama Elhakki

AI Engineer & ML Systems Builder — Morocco