What Happens to an EKF During a Sensor Blackout?

The thought of Kalman filter has been bothering my mind since the last few weeks, and I wanted to investigate this question: if an Extended Kalman Filter loses measurements for a fixed window and has to run on dynamics alone, how fast does its position uncertainty grow, and what does recovery look like when the sensor comes back?

I built the filter from scratch in JAX, validated it on CPU, and then ran the blackout experiment across several independent trajectories on an H100. The full experiment code is here: github.com/koushik-sss/kalman-expts.

1. State estimation in 90 seconds

A vehicle has a state: its position, velocity, heading, turn rate. You never observe that state directly. A radar gives you range and bearing, both corrupted by noise. A GPS gives you position, also noisy. The vehicle's own motion model tells you roughly where it should be next, but that prediction drifts because the model is imperfect and the world is noisy.

State estimation is the problem of maintaining the best possible guess about the true state, given everything you have observed so far. The key idea is simple: instead of trackinga single number, we are tracking a distribution. The estimate matters, but the uncertainty matters just as much.

For the special case where everything is linear and all noise is Gaussian, there is an exact optimal solution. That solution is the Kalman filter.

2. The Kalman filter idea

The Kalman filter represents its belief about the state using two objects:

a mean vector $\hat{x}$ , the current best estimate
a covariance matrix $P$ , encoding how spread out the uncertainty is around that estimate

At every timestep, the filter does two things.

Predict. Push the mean forward through the dynamics model. Push the covariance forward too, inflating it by the process noise $Q$ to account for model imperfection. After this step, you have a prior: a prediction of where the system should be, plus an honest statement of how uncertain that prediction is.

Update. A measurement arrives. Compare it to what the prediction expected (the innovation). Weight the correction by the Kalman gain $K$ , which balances how much you trust the prediction versus how much you trust the sensor. The posterior is always somewhere between the prior and the measurement, pulled toward whichever source has lower uncertainty.

The Kalman gain is the central mechanism. When the prediction is uncertain and the sensor is precise, $K$ is close to 1: trust the sensor. When the prediction is tight and the sensor is noisy, $K$ is close to 0: trust the prediction. This tradeoff is not a heuristic. For linear Gaussian systems, it is the mathematically optimal weighting.

How prediction and measurement compete

Drag the sliders. The posterior always lands between prior and measurement, weighted by their relative certainty. This is the core mechanism inside every Kalman update.

Kalman gain K = 0.640

trust prediction

trust sensor

innovation ν: 27.000
Kalman gain K: 0.6400
posterior mean: 52.2800
posterior variance: 5.7600

Prediction inputs

Prior mean x⁻35.0

Prior variance P⁻16.0

Measurement inputs

Measurement z62.0

Measurement variance R9.0

In the full CTRV filter, this same tradeoff plays out across five dimensions. The additional complications are Jacobians that linearize the nonlinear dynamics, covariance matrices that propagate uncertainty through those linearizations, and angle residuals that must be wrapped to [−π, π] to avoid nonsense corrections.

The widget above is a one-dimensional version. The real filter operates in five dimensions with nonlinear dynamics, but the core tradeoff is identical: the posterior is a variance-weighted compromise between prediction and measurement.

3. Going nonlinear: the Extended Kalman Filter

The standard Kalman filter requires linear dynamics and linear measurement functions. Real systems are not linear. A car turning at constant rate follows a circular arc, not a straight line. A radar reports range and bearing, which are nonlinear functions of Cartesian position.

The Extended Kalman Filter handles this by linearizing. At each timestep, it evaluates the Jacobian of the nonlinear function at the current state estimate and uses that local linear approximation for the covariance propagation.

This is a first-order Taylor approximation. If the system is strongly nonlinear or the uncertainty gets large, the approximation gets worse and the filter can become overconfident. But in the regime most tracking systems actually live in, the EKF is cheap, effective, and easy to run at scale.

The two Jacobians have different jobs:

the process Jacobian $F$ linearizes the motion model, telling the filter how uncertainty transforms through the dynamics during the predict step
the measurement Jacobian $H$ linearizes the sensor model, mapping state-space uncertainty into measurement space so the filter can compute how much to trust the incoming observation

4. The CTRV motion model

The state vector for this experiment is the standard Constant Turn Rate and Velocity (CTRV) state:

x = \begin{bmatrix} p_x & p_y & v & \theta & \omega \end{bmatrix}^T

Symbol	Meaning
$p_x, p_y$	Cartesian position
$v$	speed magnitude
$\theta$	heading angle
$\omega$	turn rate

The process model propagates this state forward by $\Delta t$ . When the turn rate $\omega$ is nonzero, the vehicle traces a circular arc:

\begin{aligned} p_{x, k+1} &= p_{x, k} + \frac{v_k}{\omega_k} \left( \sin(\theta_k + \omega_k \Delta t) - \sin(\theta_k) \right) \\ p_{y, k+1} &= p_{y, k} + \frac{v_k}{\omega_k} \left( -\cos(\theta_k + \omega_k \Delta t) + \cos(\theta_k) \right) \\ v_{k+1} &= v_k \\ \theta_{k+1} &= \theta_k + \omega_k \Delta t \\ \omega_{k+1} &= \omega_k \end{aligned}

When $\omega \approx 0$ , this reduces to straight-line motion. The implementation uses jax.lax.cond to branch between two code paths at a threshold of $|\omega| < 10^{-4}$ : the standard CTRV update when turning, and a second-order Taylor expansion when nearly straight. You cannot let a $v/\omega$ division stay in the traced computation graph for both branches. JAX traces both sides of a cond, so the safe branch must avoid the singularity entirely.

How a CTRV vehicle moves

Adjust speed, heading, and turn rate to see the predicted trajectory. With nonzero turn rate the vehicle traces a circular arc. At zero turn rate it goes straight.

Speed v2.0

Heading θ0.40 rad

Turn rate ω0.30 rad/s

Steps12

Final position

px = -7.64, py = 10.50

θ = 4.000 rad

Displacement

12.98 units

Set ω to zero and the vehicle goes straight. Increase ω and it curves. The CTRV process model encodes exactly this behavior — and the EKF propagates a covariance matrix through these nonlinear dynamics at every timestep using the analytical Jacobian F.

The process Jacobian

The EKF needs $F_k = \frac{\partial f}{\partial x}\big|_{x_k}$ to propagate covariance through the dynamics. For the CTRV model with $\omega \neq 0$ , writing $\theta' = \theta_k + \omega_k \Delta t$ for the predicted heading:

F_k = \begin{bmatrix} 1 & 0 & \frac{\sin\theta' - \sin\theta}{\omega} & \frac{v}{\omega}(\cos\theta' - \cos\theta) & F_{1,5} \\ 0 & 1 & \frac{-\cos\theta' + \cos\theta}{\omega} & \frac{v}{\omega}(\sin\theta' - \sin\theta) & F_{2,5} \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & \Delta t \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}

where the $\omega$ -column entries are:

F_{1,5} = v\left(\frac{\Delta t \cos\theta'}{\omega} - \frac{\sin\theta' - \sin\theta}{\omega^2}\right), \quad F_{2,5} = v\left(\frac{\Delta t \sin\theta'}{\omega} - \frac{-\cos\theta' + \cos\theta}{\omega^2}\right)

These are the partial derivatives of the position update with respect to turn rate. They contain $1/\omega$ and $1/\omega^2$ terms, which means the Jacobian has the same singularity as the process model itself.

For the $\omega \approx 0$ branch, the implementation uses a second-order Taylor expansion of the position update, giving a Jacobian that replaces the $v/\omega$ terms with expressions in $v \Delta t$ and $\frac{1}{2} v \Delta t^2 \omega$ :

F_{1,3}^{\text{small}} = \Delta t \cos\theta - \tfrac{1}{2}\Delta t^2 \omega \sin\theta, \quad F_{2,3}^{\text{small}} = \Delta t \sin\theta + \tfrac{1}{2}\Delta t^2 \omega \cos\theta

Both branches were verified against jax.jacfwd across thousands of random state vectors and matched within the chosen test tolerances. The implementation uses jax.lax.cond so XLA compiles both branches into the same graph. The filter never calls jacfwd at runtime.

The measurement model is a radar returning range and bearing:

h(x) = \begin{bmatrix} r \\ \phi \end{bmatrix} = \begin{bmatrix} \sqrt{p_x^2 + p_y^2} \\ \operatorname{atan2}(p_y, p_x) \end{bmatrix}

The measurement Jacobian for this radar model is compact:

H(x) = \begin{bmatrix} \frac{p_x}{r} & \frac{p_y}{r} & 0 & 0 & 0 \\ -\frac{p_y}{r^2} & \frac{p_x}{r^2} & 0 & 0 & 0 \end{bmatrix}

with $r = \sqrt{p_x^2 + p_y^2}$ . This Jacobian is singular at the origin ( $r = 0$ ), so the implementation uses an epsilon floor on range.

5. The full EKF equations

Predict step. Push the state through the nonlinear dynamics and inflate the covariance:

\hat{x}_k^- = f(\hat{x}_{k-1}^+, \Delta t)

P_k^- = F_k \, P_{k-1}^+ \, F_k^T + Q

where $F_k$ is the process Jacobian evaluated at the current state, and $Q$ is the process noise covariance. The process noise term inflates the predicted uncertainty, even though individual entries of the covariance matrix do not all have to grow monotonically.

Update step. Compare the actual measurement to the predicted one:

\nu_k = \begin{bmatrix} r_k - \hat{r}_k \\ \operatorname{wrap}(\phi_k - \hat{\phi}_k) \end{bmatrix}

The bearing residual must be wrapped to $[-\pi, \pi]$ . A measured bearing of $-179°$ and a predicted bearing of $179°$ are nearly the same direction, but the naive subtraction gives $-358°$ , producing a nonsense correction. This is a classic EKF failure mode.

Compute the innovation covariance and Kalman gain:

S_k = H_k \, P_k^- \, H_k^T + R

K_k = P_k^- \, H_k^T \, S_k^{-1}

Apply the correction:

\hat{x}_k^+ = \hat{x}_k^- + K_k \, \nu_k

Update the covariance using the Joseph form:

P_k^+ = (I - K_k H_k) \, P_k^- \, (I - K_k H_k)^T + K_k \, R \, K_k^T

The shorter textbook version $P_k^+ = P_k^- - K_k H_k P_k^-$ is algebraically equivalent but numerically fragile. Across millions of trajectories and repeated matrix operations, it accumulates roundoff that breaks the positive-definiteness of $P$ . The Joseph form prevents that.

6. Implementation choices that kept it stable

Cholesky solve instead of matrix inversion

The Kalman gain requires solving a system involving the innovation covariance $S_k$ . The naive approach is to compute $S_k^{-1}$ explicitly: build the full inverse matrix, then multiply. This is the wrong operation for two reasons.

First, it is slow. Inverting a matrix takes more floating-point operations than solving the equivalent linear system. For a $2 \times 2$ measurement space the difference is small, but the habit matters: explicit inversion is almost never the right numerical choice.

Second, it is less accurate. Every extra multiplication accumulates roundoff error. When you invert $S_k$ and then multiply by another matrix, you get two layers of approximation instead of one.

The better path: $S_k$ is symmetric positive definite by construction (it is a covariance plus measurement noise, both of which are positive semidefinite). That means it has a Cholesky factorization, a decomposition into $S_k = L_k L_k^T$ where $L_k$ is lower triangular. Solving triangular systems is fast (just forward and back substitution) and numerically tight. The implementation calls cho_factor / cho_solve from JAX's linear algebra routines, never inv.

Defensive symmetry enforcement

A covariance matrix is, by definition, symmetric. But floating-point arithmetic does not preserve symmetry exactly. After dozens of predict-update cycles, each involving matrix multiplications and additions, tiny asymmetries accumulate. A matrix that is supposed to be $P = P^T$ might have entries that differ by $10^{-7}$ between the upper and lower triangles.

Why does that matter? Because the Cholesky factorization requires its input to be symmetric positive definite. If the matrix has drifted even slightly away from symmetry, the factorization can fail or produce garbage. At 10 million trajectories, even a one-in-a-million failure rate means thousands of crashes.

The fix is mechanical:

Before factorizing $S_k$ , enforce symmetry: $S \leftarrow \tfrac{1}{2}(S + S^T)$ . This averages the upper and lower triangles, killing the asymmetry.
Add a small diagonal jitter $\epsilon I$ (with $\epsilon = 10^{-6}$ ) to guarantee the matrix stays safely positive definite, not just barely on the boundary.
After the Joseph covariance update, re-symmetrize the result if roundoff drift appeared.

These guards are boring, but they are the difference between a run that finishes and a run that dies for a numerical instability problem halfway through.

Explicit analytic Jacobians in the hot path

The runtime filter calls explicit Jacobian functions (process_jacobian and measurement_jacobian) with the closed-form partial derivatives shown in the previous sections. The alternative would be calling jax.jacfwd inside the traced filter loop. Both produce the same numerical answer. The benefit of explicit functions is not mathematical accuracy. It is runtime shape and compiler cost.

Three things get simpler when the Jacobian is an explicit function rather than an autodiff call:

Lower hot-path overhead. jax.jacfwd builds derivative computation into the traced program. For a single prototype run that is fine. For a batched EKF running across millions of trajectories inside a vmap-ed lax.scan, it adds extra work to the inner loop at every timestep of every trajectory.
Simpler XLA graph. With explicit Jacobians, the compiled program sees direct scalar arithmetic: sines, cosines, divisions. That gives the compiler a cleaner graph than "differentiate this nonlinear function with a lax.cond branch here, every step."
More predictable scaling. The EKF step already involves nonlinear dynamics, branching for the small- $\omega$ case, Cholesky solves, and Joseph covariance updates. Removing autodiff from the runtime path makes performance and memory behavior easier to reason about when deciding chunk sizes for the H100 run.

What explicit Jacobians do not buy: better filtering theory, fundamentally different numerical results, or automatic correctness. The math is identical either way. That is exactly why the offline jax.jacfwd verification still matters. The explicit functions could have a typo in any partial derivative, and autodiff would catch it. jax.jacfwd only appears in the verification path (cli.py and test_models.py), where it compares the analytic Jacobians against autodiff across thousands of random state vectors. In that verification pass, the analytic and autodiff Jacobians matched within the configured tolerances.

The trade is: autodiff for verification, explicit Jacobians for production execution. Same math, less compiler baggage in the actual large run.

Angle wrapping everywhere it matters

The heading state $\theta$ is wrapped after prediction. The bearing residual is wrapped during update. Both are mandatory. Without wrapping, the filter accumulates unbounded angle values and the linearization assumptions degrade.

7. The blackout experiment

The experiment is simple in concept: run the filter normally for 40 timesteps, then cut off all measurements from timestep 40 through 69 (30 timesteps of total sensor blackout), then resume normal operation from timestep 70 through 119.

During the blackout, the filter still runs. It predicts at every step. It just never corrects. The predict step keeps inflating the covariance by $Q$ , and without a measurement to pull it back, the uncertainty grows without bound.

The scalar I track is:

\det(P_{xy})

where $P_{xy}$ is the $2 \times 2$ positional covariance block extracted from the full $5 \times 5$ covariance matrix. This determinant is proportional to the area of the uncertainty ellipse in the position plane. It is not literally a physical bound (EKF covariance is a first-order local approximation) but it is a clean scalar proxy for how much the filter thinks position uncertainty has spread.

Before looking at the quantitative chart, scrub through this spatial view. The vehicle moves along a CTRV arc. The shaded circle is the filter's covariance, its honest statement of how uncertain it is about position. The crosshairs are incoming radar measurements. Watch what happens at t=40 when the measurements vanish.

What the filter sees — and stops seeing

The shaded region is the filter uncertainty about position. The crosshairs are incoming measurements. Scrub into the blackout and watch the ellipse swallow the canvas.

t=39 · Steady State

1.00×

uncertainty

t=39 · steady state

119

That is the spatial story. Now here is the same data plotted quantitatively across all 10 million trajectories:

Blackout growth across 10 million trajectories

Scrub the timeline or press play to watch positional uncertainty explode when the sensor goes dark — then collapse when it returns.

scale

mean (10M)p05–p95repeat run

t=39 · steady state

det(P_xy): 0.02045
p05–p95: 0.01856 – 0.02477
ratio vs baseline: 1.00×
repeat Δ: -0.0049%

The transition at t=40 is abrupt. Once the updates disappear, the filter is just integrating its own model error. The uncertainty does not drift upward politely; it opens up fast. At t=70 the measurements come back, the Kalman gain jumps, and the covariance starts collapsing toward baseline again over the next few updates.

8. Scaling to 10 million on an H100

The execution architecture had one validated single-filter path and one scalable batch path. I did not write separate logic for small and large runs. I wrote one filter, validated it, and lifted it into the batch dimension.

jax.lax.scan traces the temporal loop. The scan carry holds the true state, the filter state, the filter covariance, and the PRNG key. Process noise and measurement noise are generated inside the traced computation. There is no pre-allocated [10_000_000, T, 2] measurement tensor.
jax.vmap lifts the single-filter scan across independent trajectories.
jax.jit compiles the batched runner once, then runs it repeatedly.

The H100 experiment was executed in 40 chunks of 250,000 trajectories each. Each chunk after the first compiled run took about 0.14 seconds. The scale path returned compact per-timestep summaries (mean, percentiles of the determinant trace) rather than full covariance histories. Storing debug arrays for 10 million runs would have been the wrong memory model.

Parameter	Value
Total trajectories	10,000,000
Chunk size	250,000
Timesteps per trajectory	120
Blackout window	t=40 through t=69
GPU	NVIDIA H100 80GB
First chunk (compile + run)	~3.1 s
Steady-state chunk	~0.14 s

9. The result

Timestep	Phase	Mean det( $P_{xy}$ )
39	last healthy timestep	`0.02045`
69	last blackout timestep	`77.53671`
119	end of run after recovery	`0.08425`

The blackout produced a roughly 3800× increase in the positional uncertainty determinant. The recovery brought it back down to within 4× of the pre-blackout baseline, not snapping to zero but converging over several update cycles.

The three-phase behavior is exactly what the theory predicts:

Steady state. With regular measurements, the covariance settles to a small non-zero baseline governed by the balance between process noise $Q$ (which inflates uncertainty) and measurement corrections (which contract it).
Blackout. The correction term disappears. The predict step keeps integrating process noise through the nonlinear dynamics, and without the stabilizing pull of measurement updates, the covariance grows rapidly.
Recovery. Measurements return and the update step re-engages. The Kalman gain is initially large because the filter is very uncertain, so each early measurement produces a large correction. Over several steps, the covariance contracts back toward the steady-state baseline.

10. What this does and does not prove

What this experiment gives me:

that a properly implemented EKF survives a long blackout without numerical blowup
that the covariance spike during sensor loss is large, fast, and repeatable across 10 million independent trajectories
that the recovery behavior is consistent with a well-tuned filter rather than a broken one

What it does not buy me:

that EKF covariance is an exact physical uncertainty bound (it is a first-order approximation and can be overconfident under strong nonlinearity)
that the filter is globally calibrated under all operating conditions
that the result automatically transfers to different $Q/R$ tuning, different motion regimes, or outlier-corrupted measurements instead of missing measurements

Those are separate questions. They matter, but they are not this post.

11. Convergence and repeatability

I put the convergence check at the bottom on purpose. It is not the interesting part of the story, but it is the part that keeps the story honest.

Two checks mattered:

1M versus 10M. Are the summary statistics still moving materially when I 10× the sample count? The percentile bands were already essentially converged by 1M. The mean curve shifted by at most 6.96% at the peak, but the qualitative story did not change.

10M(seed=0) versus 10M(seed=1). Does the result repeat independently? The maximum relative difference on the mean curve was 0.134%. The percentile bands were tighter still: under 0.03% maximum relative difference. That is enough to treat the 10M summary as stable for this specific experiment.

Pushing to 100M would mostly buy a bigger headline and a more annoying quantile-reduction problem. If I keep pushing this project, the right next step is approximate or streaming quantiles, not brute-forcing a larger exact reduction.

Is 10 million enough?

Top: the three mean traces overlaid. Bottom: relative difference against the primary 10M run. By 10M the curves have converged—the repeat run barely deviates.

hover the figure · t=69

1M10M primary10M repeat

1M vs 10M: 6.276%
repeat vs primary: -0.031%

Code for the filter, experiments, and plotting pipeline: koushik-sss/kalman-expts.

References

R. E. Kalman, A New Approach to Linear Filtering and Prediction Problems (1960).
Greg Welch and Gary Bishop, An Introduction to the Kalman Filter.
Roger R. Labbe Jr., Kalman and Bayesian Filters in Python.