feature storerealtimemachine learningdata engineering

Building Real-Time Feature Stores for Machine Learning

May 2, 2026

Building Real-Time Feature Stores for Machine Learning

Your fraud detection model is brilliant. It catches 94% of fraudulent transactions in offline evaluation. Then you deploy it and the performance tanks — not because the model is wrong, but because the features it's getting in production are stale, inconsistent, or just plain different from what it trained on. That's the training-serving skew problem, and it's one of the most painful issues in production ML.

Feature stores exist to solve this. And if you're building real-time ML applications — fraud detection, recommendation engines, dynamic pricing, personalization — you need to understand how to build one that actually performs under pressure.

What a Feature Store Actually Does

A feature store is infrastructure that sits between your raw data and your ML models. It has two jobs:

Offline serving — providing historical feature data for model training

Online serving — providing low-latency feature lookups for real-time inference

The critical insight is that both jobs need to serve the *same* features. That consistency is the whole point.

A typical feature store has three layers:

Feature computation — transforms raw data into features (can be batch or streaming)

Feature storage — a dual-store setup with a slow store for history and a fast store for real-time

Feature serving — an API layer that retrieves features for training or inference

The dual-store pattern is worth understanding deeply. You use something like Apache Cassandra or Redis for online lookups (sub-millisecond reads), and Apache Parquet on S3 or BigQuery for offline training data. The trick is keeping them in sync.

The Architecture in Practice

Here's a simplified architecture for a real-time feature store:

Raw Events → Kafka → Feature Pipeline (Flink/Spark Streaming)
                              ↓
                    ┌─────────────────┐
                    │  Feature Store  │
                    ├────────┬────────┤
                    │ Online │Offline │
                    │ Redis  │  S3    │
                    └────────┴────────┘
                              ↓
                    Feature Serving API
                              ↓
                    Model Inference Service

Events flow in through Kafka. A streaming pipeline (Flink is popular here) computes features and writes to both stores simultaneously. This dual-write is how you avoid skew — the same computation produces both the training data and the serving data.

Building the Feature Pipeline

Let's look at a concrete example. Say you're computing user-level features for a recommendation system — things like "number of purchases in the last 24 hours" or "average session duration this week."

Here's a simplified Flink job that computes a rolling feature:

from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.window import TumblingEventTimeWindows
from pyflink.common.time import Time
env = StreamExecutionEnvironment.get_execution_environment()
Read from Kafka
events = env.add_source(kafka_consumer)
Compute purchase count per user in 1-hour windows
purchase_features = (
    events
    .filter(lambda e: e["event_type"] == "purchase")
    .key_by(lambda e: e["user_id"])
    .window(TumblingEventTimeWindows.of(Time.hours(1)))
    .aggregate(PurchaseCountAggregator())
)
Write to both online and offline stores
purchase_features.add_sink(RedisSink(host="redis-cluster", port=6379))
purchase_features.add_sink(S3Sink(bucket="feature-store", prefix="purchases/"))env.execute("purchase_feature_pipeline")

The PurchaseCountAggregator would implement AggregateFunction and maintain a running count. The key thing here is that both sinks receive the exact same computed values — no divergence.

Serving Features at Low Latency

The online serving layer needs to be fast. When a model needs features for inference, you typically have a budget of 50-100ms for the entire request, and feature retrieval should consume at most 10-20ms of that.

Here's a simple feature server using FastAPI and Redis:

import redis
import json
from fastapi import FastAPI, HTTPException
from typing import List
app = FastAPI()
redis_client = redis.Redis(host="redis-cluster", port=6379, decode_responses=True)@app.get("/features/{entity_id}")
async def get_features(entity_id: str, feature_names: List[str]):
    pipeline = redis_client.pipeline()
    
    for feature_name in feature_names:
        key = f"feature:{feature_name}:{entity_id}"
        pipeline.get(key)
    
    results = pipeline.execute()
    
    features = {}
    for feature_name, value in zip(feature_names, results):
        if value is None:
            # Return default values for missing features
            features[feature_name] = get_default_value(feature_name)
        else:
            features[feature_name] = json.loads(value)
    
    return {"entity_id": entity_id, "features": features}

Notice the Redis pipeline — this batches all your GET commands into a single round trip. If you're fetching 20 features, you don't want 20 separate network calls. This alone can cut your latency by 10x.

Handling the Hard Parts

Feature freshness is a constant battle. Some features need to be updated every second (current cart value), others every hour (weekly engagement score), others daily (demographic segments). Your pipeline needs to handle different update frequencies gracefully. Tag each feature with a TTL in Redis and a refresh schedule in your orchestration layer.

Backfilling is painful but necessary. When you add a new feature, you need historical values to retrain your model. Design your offline store schema upfront to make point-in-time correct queries possible. This means storing feature values with timestamps and being able to reconstruct "what did this feature look like at time T?"

-- Point-in-time correct feature lookup
SELECT 
    entity_id,
    feature_name,
    feature_value
FROM feature_store.feature_history
WHERE entity_id = '12345'
  AND feature_name = 'purchase_count_24h'
  AND computed_at <= '2024-01-15 14:30:00'
ORDER BY computed_at DESC
LIMIT 1;

Default values deserve more thought than they usually get. When a new user hits your system, most of their features won't exist yet. Returning null to your model is almost always wrong — it'll behave unpredictably. Define sensible defaults for every feature (population median, zero, empty list) and handle missing data explicitly in your serving layer.

Don't Reinvent Everything

Before building from scratch, check out the open-source ecosystem. Feast is the most mature open-source feature store and handles a lot of the infrastructure plumbing. Hopsworks offers a managed option. If you're on AWS, SageMaker Feature Store is worth evaluating. These aren't perfect, but they solve the boring problems so you can focus on your actual feature logic.

That said, there are good reasons to build custom — you need tighter latency guarantees, you have unusual data types, or you're operating at a scale where managed solutions get expensive fast.

Practical Tips Before You Ship

Monitor feature drift, not just model drift. If a feature's distribution shifts, your model performance will degrade even if the model itself is fine.

Version your features. When you change how a feature is computed, you need to be able to serve both the old and new version during transition.

Log the features you served for every prediction. This is non-negotiable for debugging and retraining.

Test your pipeline end-to-end with production-like data before you go live. Latency surprises in production are never fun.

Next Steps

If you're starting fresh: set up a local Redis instance and build a simple feature serving API for one use case. Get that working end-to-end before adding complexity. Then introduce Kafka and a streaming pipeline once you understand what your feature freshness requirements actually are.

If you're inheriting an existing system with training-serving skew: audit your feature computation code first. Nine times out of ten, the offline training pipeline and the online serving code are separate implementations that have drifted apart. Consolidating them onto a single computation engine is usually the highest-leverage fix.

Feature stores aren't glamorous, but they're the difference between a model that works in notebooks and one that works in production. Get the infrastructure right and everything else gets easier.