Sai Koushik

What Happens to an EKF During a Sensor Blackout?

The thought of Kalman filter has been bothering my mind since the last few weeks, and I wanted to investigate this question: if an Extended Kalman Filter loses measurements for a fixed window and has to run on dynamics alone, how fast does its position uncertainty grow, and what does recovery look like when the sensor comes back?

I built the filter from scratch in JAX, validated it on CPU, and then ran the blackout experiment across several independent trajectories on an H100. The full experiment code is here: github.com/koushik-sss/kalman-expts.


1. State estimation in 90 seconds

A vehicle has a state: its position, velocity, heading, turn rate. You never observe that state directly. A radar gives you range and bearing, both corrupted by noise. A GPS gives you position, also noisy. The vehicle's own motion model tells you roughly where it should be next, but that prediction drifts because the model is imperfect and the world is noisy.

State estimation is the problem of maintaining the best possible guess about the true state, given everything you have observed so far. The key idea is simple: instead of trackinga single number, we are tracking a distribution. The estimate matters, but the uncertainty matters just as much.

For the special case where everything is linear and all noise is Gaussian, there is an exact optimal solution. That solution is the Kalman filter.

2. The Kalman filter idea

The Kalman filter represents its belief about the state using two objects:

At every timestep, the filter does two things.

Predict. Push the mean forward through the dynamics model. Push the covariance forward too, inflating it by the process noise QQ to account for model imperfection. After this step, you have a prior: a prediction of where the system should be, plus an honest statement of how uncertain that prediction is.

Update. A measurement arrives. Compare it to what the prediction expected (the innovation). Weight the correction by the Kalman gain KK, which balances how much you trust the prediction versus how much you trust the sensor. The posterior is always somewhere between the prior and the measurement, pulled toward whichever source has lower uncertainty.

The Kalman gain is the central mechanism. When the prediction is uncertain and the sensor is precise, KK is close to 1: trust the sensor. When the prediction is tight and the sensor is noisy, KK is close to 0: trust the prediction. This tradeoff is not a heuristic. For linear Gaussian systems, it is the mathematically optimal weighting.

How prediction and measurement compete

Drag the sliders. The posterior always lands between prior and measurement, weighted by their relative certainty. This is the core mechanism inside every Kalman update.

K·ν = 17.3priormeasurementposterior
Kalman gain K = 0.640
trust prediction
trust sensor
innovation ν
27.000
Kalman gain K
0.6400
posterior mean
52.2800
posterior variance
5.7600
Prediction inputs
Measurement inputs
In the full CTRV filter, this same tradeoff plays out across five dimensions. The additional complications are Jacobians that linearize the nonlinear dynamics, covariance matrices that propagate uncertainty through those linearizations, and angle residuals that must be wrapped to [−π, π] to avoid nonsense corrections.

The widget above is a one-dimensional version. The real filter operates in five dimensions with nonlinear dynamics, but the core tradeoff is identical: the posterior is a variance-weighted compromise between prediction and measurement.

3. Going nonlinear: the Extended Kalman Filter

The standard Kalman filter requires linear dynamics and linear measurement functions. Real systems are not linear. A car turning at constant rate follows a circular arc, not a straight line. A radar reports range and bearing, which are nonlinear functions of Cartesian position.

The Extended Kalman Filter handles this by linearizing. At each timestep, it evaluates the Jacobian of the nonlinear function at the current state estimate and uses that local linear approximation for the covariance propagation.

This is a first-order Taylor approximation. If the system is strongly nonlinear or the uncertainty gets large, the approximation gets worse and the filter can become overconfident. But in the regime most tracking systems actually live in, the EKF is cheap, effective, and easy to run at scale.

The two Jacobians have different jobs:

4. The CTRV motion model

The state vector for this experiment is the standard Constant Turn Rate and Velocity (CTRV) state:

x=[pxpyvθω]Tx = \begin{bmatrix} p_x & p_y & v & \theta & \omega \end{bmatrix}^T
SymbolMeaning
px,pyp_x, p_yCartesian position
vvspeed magnitude
θ\thetaheading angle
ω\omegaturn rate

The process model propagates this state forward by Δt\Delta t. When the turn rate ω\omega is nonzero, the vehicle traces a circular arc:

px,k+1=px,k+vkωk(sin(θk+ωkΔt)sin(θk))py,k+1=py,k+vkωk(cos(θk+ωkΔt)+cos(θk))vk+1=vkθk+1=θk+ωkΔtωk+1=ωk\begin{aligned} p_{x, k+1} &= p_{x, k} + \frac{v_k}{\omega_k} \left( \sin(\theta_k + \omega_k \Delta t) - \sin(\theta_k) \right) \\ p_{y, k+1} &= p_{y, k} + \frac{v_k}{\omega_k} \left( -\cos(\theta_k + \omega_k \Delta t) + \cos(\theta_k) \right) \\ v_{k+1} &= v_k \\ \theta_{k+1} &= \theta_k + \omega_k \Delta t \\ \omega_{k+1} &= \omega_k \end{aligned}

When ω0\omega \approx 0, this reduces to straight-line motion. The implementation uses jax.lax.cond to branch between two code paths at a threshold of ω<104|\omega| < 10^{-4}: the standard CTRV update when turning, and a second-order Taylor expansion when nearly straight. You cannot let a v/ωv/\omega division stay in the traced computation graph for both branches. JAX traces both sides of a cond, so the safe branch must avoid the singularity entirely.

How a CTRV vehicle moves

Adjust speed, heading, and turn rate to see the predicted trajectory. With nonzero turn rate the vehicle traces a circular arc. At zero turn rate it goes straight.

t=0t=12x = [px, py, v, θ, ω]ᵀ
Final position
px = -7.64, py = 10.50
θ = 4.000 rad
Displacement
12.98 units

Set ω to zero and the vehicle goes straight. Increase ω and it curves. The CTRV process model encodes exactly this behavior — and the EKF propagates a covariance matrix through these nonlinear dynamics at every timestep using the analytical Jacobian F.

The process Jacobian

The EKF needs Fk=fxxkF_k = \frac{\partial f}{\partial x}\big|_{x_k} to propagate covariance through the dynamics. For the CTRV model with ω0\omega \neq 0, writing θ=θk+ωkΔt\theta' = \theta_k + \omega_k \Delta t for the predicted heading:

Fk=[10sinθsinθωvω(cosθcosθ)F1,501cosθ+cosθωvω(sinθsinθ)F2,5001000001Δt00001]F_k = \begin{bmatrix} 1 & 0 & \frac{\sin\theta' - \sin\theta}{\omega} & \frac{v}{\omega}(\cos\theta' - \cos\theta) & F_{1,5} \\ 0 & 1 & \frac{-\cos\theta' + \cos\theta}{\omega} & \frac{v}{\omega}(\sin\theta' - \sin\theta) & F_{2,5} \\ 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & \Delta t \\ 0 & 0 & 0 & 0 & 1 \end{bmatrix}

where the ω\omega-column entries are:

F1,5=v(Δtcosθωsinθsinθω2),F2,5=v(Δtsinθωcosθ+cosθω2)F_{1,5} = v\left(\frac{\Delta t \cos\theta'}{\omega} - \frac{\sin\theta' - \sin\theta}{\omega^2}\right), \quad F_{2,5} = v\left(\frac{\Delta t \sin\theta'}{\omega} - \frac{-\cos\theta' + \cos\theta}{\omega^2}\right)

These are the partial derivatives of the position update with respect to turn rate. They contain 1/ω1/\omega and 1/ω21/\omega^2 terms, which means the Jacobian has the same singularity as the process model itself.

For the ω0\omega \approx 0 branch, the implementation uses a second-order Taylor expansion of the position update, giving a Jacobian that replaces the v/ωv/\omega terms with expressions in vΔtv \Delta t and 12vΔt2ω\frac{1}{2} v \Delta t^2 \omega:

F1,3small=Δtcosθ12Δt2ωsinθ,F2,3small=Δtsinθ+12Δt2ωcosθF_{1,3}^{\text{small}} = \Delta t \cos\theta - \tfrac{1}{2}\Delta t^2 \omega \sin\theta, \quad F_{2,3}^{\text{small}} = \Delta t \sin\theta + \tfrac{1}{2}\Delta t^2 \omega \cos\theta

Both branches were verified against jax.jacfwd across thousands of random state vectors and matched within the chosen test tolerances. The implementation uses jax.lax.cond so XLA compiles both branches into the same graph. The filter never calls jacfwd at runtime.

The measurement model is a radar returning range and bearing:

h(x)=[rϕ]=[px2+py2atan2(py,px)]h(x) = \begin{bmatrix} r \\ \phi \end{bmatrix} = \begin{bmatrix} \sqrt{p_x^2 + p_y^2} \\ \operatorname{atan2}(p_y, p_x) \end{bmatrix}

The measurement Jacobian for this radar model is compact:

H(x)=[pxrpyr000pyr2pxr2000]H(x) = \begin{bmatrix} \frac{p_x}{r} & \frac{p_y}{r} & 0 & 0 & 0 \\ -\frac{p_y}{r^2} & \frac{p_x}{r^2} & 0 & 0 & 0 \end{bmatrix}

with r=px2+py2r = \sqrt{p_x^2 + p_y^2}. This Jacobian is singular at the origin (r=0r = 0), so the implementation uses an epsilon floor on range.

5. The full EKF equations

Predict step. Push the state through the nonlinear dynamics and inflate the covariance:

x^k=f(x^k1+,Δt)\hat{x}_k^- = f(\hat{x}_{k-1}^+, \Delta t) Pk=FkPk1+FkT+QP_k^- = F_k \, P_{k-1}^+ \, F_k^T + Q

where FkF_k is the process Jacobian evaluated at the current state, and QQ is the process noise covariance. The process noise term inflates the predicted uncertainty, even though individual entries of the covariance matrix do not all have to grow monotonically.

Update step. Compare the actual measurement to the predicted one:

νk=[rkr^kwrap(ϕkϕ^k)]\nu_k = \begin{bmatrix} r_k - \hat{r}_k \\ \operatorname{wrap}(\phi_k - \hat{\phi}_k) \end{bmatrix}

The bearing residual must be wrapped to [π,π][-\pi, \pi]. A measured bearing of 179°-179° and a predicted bearing of 179°179° are nearly the same direction, but the naive subtraction gives 358°-358°, producing a nonsense correction. This is a classic EKF failure mode.

Compute the innovation covariance and Kalman gain:

Sk=HkPkHkT+RS_k = H_k \, P_k^- \, H_k^T + R Kk=PkHkTSk1K_k = P_k^- \, H_k^T \, S_k^{-1}

Apply the correction:

x^k+=x^k+Kkνk\hat{x}_k^+ = \hat{x}_k^- + K_k \, \nu_k

Update the covariance using the Joseph form:

Pk+=(IKkHk)Pk(IKkHk)T+KkRKkTP_k^+ = (I - K_k H_k) \, P_k^- \, (I - K_k H_k)^T + K_k \, R \, K_k^T

The shorter textbook version Pk+=PkKkHkPkP_k^+ = P_k^- - K_k H_k P_k^- is algebraically equivalent but numerically fragile. Across millions of trajectories and repeated matrix operations, it accumulates roundoff that breaks the positive-definiteness of PP. The Joseph form prevents that.

6. Implementation choices that kept it stable

Cholesky solve instead of matrix inversion

The Kalman gain requires solving a system involving the innovation covariance SkS_k. The naive approach is to compute Sk1S_k^{-1} explicitly: build the full inverse matrix, then multiply. This is the wrong operation for two reasons.

First, it is slow. Inverting a matrix takes more floating-point operations than solving the equivalent linear system. For a 2×22 \times 2 measurement space the difference is small, but the habit matters: explicit inversion is almost never the right numerical choice.

Second, it is less accurate. Every extra multiplication accumulates roundoff error. When you invert SkS_k and then multiply by another matrix, you get two layers of approximation instead of one.

The better path: SkS_k is symmetric positive definite by construction (it is a covariance plus measurement noise, both of which are positive semidefinite). That means it has a Cholesky factorization, a decomposition into Sk=LkLkTS_k = L_k L_k^T where LkL_k is lower triangular. Solving triangular systems is fast (just forward and back substitution) and numerically tight. The implementation calls cho_factor / cho_solve from JAX's linear algebra routines, never inv.

Defensive symmetry enforcement

A covariance matrix is, by definition, symmetric. But floating-point arithmetic does not preserve symmetry exactly. After dozens of predict-update cycles, each involving matrix multiplications and additions, tiny asymmetries accumulate. A matrix that is supposed to be P=PTP = P^T might have entries that differ by 10710^{-7} between the upper and lower triangles.

Why does that matter? Because the Cholesky factorization requires its input to be symmetric positive definite. If the matrix has drifted even slightly away from symmetry, the factorization can fail or produce garbage. At 10 million trajectories, even a one-in-a-million failure rate means thousands of crashes.

The fix is mechanical:

These guards are boring, but they are the difference between a run that finishes and a run that dies for a numerical instability problem halfway through.

Explicit analytic Jacobians in the hot path

The runtime filter calls explicit Jacobian functions (process_jacobian and measurement_jacobian) with the closed-form partial derivatives shown in the previous sections. The alternative would be calling jax.jacfwd inside the traced filter loop. Both produce the same numerical answer. The benefit of explicit functions is not mathematical accuracy. It is runtime shape and compiler cost.

Three things get simpler when the Jacobian is an explicit function rather than an autodiff call:

What explicit Jacobians do not buy: better filtering theory, fundamentally different numerical results, or automatic correctness. The math is identical either way. That is exactly why the offline jax.jacfwd verification still matters. The explicit functions could have a typo in any partial derivative, and autodiff would catch it. jax.jacfwd only appears in the verification path (cli.py and test_models.py), where it compares the analytic Jacobians against autodiff across thousands of random state vectors. In that verification pass, the analytic and autodiff Jacobians matched within the configured tolerances.

The trade is: autodiff for verification, explicit Jacobians for production execution. Same math, less compiler baggage in the actual large run.

Angle wrapping everywhere it matters

The heading state θ\theta is wrapped after prediction. The bearing residual is wrapped during update. Both are mandatory. Without wrapping, the filter accumulates unbounded angle values and the linearization assumptions degrade.

7. The blackout experiment

The experiment is simple in concept: run the filter normally for 40 timesteps, then cut off all measurements from timestep 40 through 69 (30 timesteps of total sensor blackout), then resume normal operation from timestep 70 through 119.

During the blackout, the filter still runs. It predicts at every step. It just never corrects. The predict step keeps inflating the covariance by QQ, and without a measurement to pull it back, the uncertainty grows without bound.

The scalar I track is:

det(Pxy)\det(P_{xy})

where PxyP_{xy} is the 2×22 \times 2 positional covariance block extracted from the full 5×55 \times 5 covariance matrix. This determinant is proportional to the area of the uncertainty ellipse in the position plane. It is not literally a physical bound (EKF covariance is a first-order local approximation) but it is a clean scalar proxy for how much the filter thinks position uncertainty has spread.

Before looking at the quantitative chart, scrub through this spatial view. The vehicle moves along a CTRV arc. The shaded circle is the filter's covariance, its honest statement of how uncertain it is about position. The crosshairs are incoming radar measurements. Watch what happens at t=40 when the measurements vanish.

What the filter sees — and stops seeing

The shaded region is the filter uncertainty about position. The crosshairs are incoming measurements. Scrub into the blackout and watch the ellipse swallow the canvas.

t=39 · Steady State
1.00×
uncertainty
t=39 · steady state
0
119

That is the spatial story. Now here is the same data plotted quantitatively across all 10 million trajectories:

Blackout growth across 10 million trajectories

Scrub the timeline or press play to watch positional uncertainty explode when the sensor goes dark — then collapse when it returns.

scale
steady statesensor blackoutrecovery0.010.11101000204060801001190.02045det(P_xy)timestep
mean (10M)p05–p95repeat run
det(P_xy)
0.02045
p05–p95
0.018560.02477
ratio vs baseline
1.00×
repeat Δ
-0.0049%

The transition at t=40 is abrupt. Once the updates disappear, the filter is just integrating its own model error. The uncertainty does not drift upward politely; it opens up fast. At t=70 the measurements come back, the Kalman gain jumps, and the covariance starts collapsing toward baseline again over the next few updates.

8. Scaling to 10 million on an H100

The execution architecture had one validated single-filter path and one scalable batch path. I did not write separate logic for small and large runs. I wrote one filter, validated it, and lifted it into the batch dimension.

The H100 experiment was executed in 40 chunks of 250,000 trajectories each. Each chunk after the first compiled run took about 0.14 seconds. The scale path returned compact per-timestep summaries (mean, percentiles of the determinant trace) rather than full covariance histories. Storing debug arrays for 10 million runs would have been the wrong memory model.

ParameterValue
Total trajectories10,000,000
Chunk size250,000
Timesteps per trajectory120
Blackout windowt=40 through t=69
GPUNVIDIA H100 80GB
First chunk (compile + run)~3.1 s
Steady-state chunk~0.14 s

9. The result

TimestepPhaseMean det(PxyP_{xy})
39last healthy timestep0.02045
69last blackout timestep77.53671
119end of run after recovery0.08425

The blackout produced a roughly 3800× increase in the positional uncertainty determinant. The recovery brought it back down to within 4× of the pre-blackout baseline, not snapping to zero but converging over several update cycles.

The three-phase behavior is exactly what the theory predicts:

10. What this does and does not prove

What this experiment gives me:

What it does not buy me:

Those are separate questions. They matter, but they are not this post.

11. Convergence and repeatability

I put the convergence check at the bottom on purpose. It is not the interesting part of the story, but it is the part that keeps the story honest.

Two checks mattered:

1M versus 10M. Are the summary statistics still moving materially when I 10× the sample count? The percentile bands were already essentially converged by 1M. The mean curve shifted by at most 6.96% at the peak, but the qualitative story did not change.

10M(seed=0) versus 10M(seed=1). Does the result repeat independently? The maximum relative difference on the mean curve was 0.134%. The percentile bands were tighter still: under 0.03% maximum relative difference. That is enough to treat the 10M summary as stable for this specific experiment.

Pushing to 100M would mostly buy a bigger headline and a more annoying quantile-reduction problem. If I keep pushing this project, the right next step is approximate or streaming quantiles, not brute-forcing a larger exact reduction.

Is 10 million enough?

Top: the three mean traces overlaid. Bottom: relative difference against the primary 10M run. By 10M the curves have converged—the repeat run barely deviates.

hover the figure · t=69
0204060801001190.010.1110100-6.38%0.00%6.96%mean det(P_xy) · log scalerelative difference vs primary 10M run
1M10M primary10M repeat
1M vs 10M
6.276%
repeat vs primary
-0.031%

Code for the filter, experiments, and plotting pipeline: koushik-sss/kalman-expts.

References