Performance Budgets & Real User Monitoring

Setting Performance Budgets

A performance budget is a set of limits — on file sizes, request counts, or timing metrics — that your team agrees not to exceed. Budgets are not aspirational targets; they are constraints you enforce, like a financial budget you consciously spend. As Addy Osmani puts it, view them as a currency to spend and trade on user experience. Every new feature, third-party script, or image carousel has a cost, and the budget forces teams to weigh that cost against user experience before shipping.

For 2026, practical starting budgets should reflect the realities of the global device and network landscape documented in Alex Russell’s Performance Inequality Gap report. A reasonable baseline for a mobile-first web app targeting the P75 connection (9 Mbps down, 100ms RTT) might be: total JavaScript < 200KB gzipped on initial load (which expands to roughly 700KB–1MB uncompressed — enough to take well over a second to parse and execute on a mid-range phone); critical-path CSS < 50KB; total page weight < 1.5MB including images; and LCP < 2.5s at the 75th percentile on the target device class. Addy Osmani’s earlier guidance of ~170KB of gzipped JS remains a solid aggressive target, particularly for the critical path on mobile. A 2026 web development guide from Pagepro recommends ≤400KB JS gzipped as an upper bound for interactive pages — a more permissive ceiling that acknowledges the size of modern framework bundles but still requires discipline.

Budgets come in three flavors. Quantity-based budgets limit raw values like JS size, image weight, or HTTP request count. Milestone-based budgets set thresholds on timing metrics like LCP, INP, TBT, or FCP. Rule-based budgets use composite scores from tools like Lighthouse. The most effective approach combines all three — a JS size budget catches regressions in CI, a timing budget catches real-world experience degradation in production, and a Lighthouse score budget provides a quick health check.

Resources:

Enforcing Budgets in CI/CD

A budget that isn’t enforced is a suggestion. The most impactful budgets are integrated into CI/CD pipelines, where they can block pull requests that introduce regressions before they reach production. Several tools make this practical:

Size-limit (by Andrey Sitnik) goes beyond simple bundle size checks — it also measures JavaScript execution time using a headless Chrome, giving you a budget that reflects both transfer cost and CPU cost. It integrates as a GitHub Action that comments size changes on PRs. Bundlesize and its successor Bundlewatch set per-file compressed size thresholds and report pass/fail status directly on GitHub pull requests; Bootstrap, Tinder, and Trivago all use this pattern. Lighthouse CI (the official Google project) runs full Lighthouse audits on ephemeral or staging URLs in CI, supports both performance budgets (budget.json) and assertion-based checks (fail if LCP > 2.5s), and provides a server for historical comparison between builds. Webpack has built-in performance.hints configuration that warns or errors when entry points exceed a size threshold — this provides instant developer feedback without any external service.

For metric-based budgets (timing, Core Web Vitals), the most robust approach is to combine synthetic CI checks (Lighthouse CI on every PR) with RUM-based alerting in production (SpeedCurve, DebugBear, or Sentry alerts when P75 metrics regress beyond a threshold). MDN’s guidance recommends setting two levels: a warning threshold that triggers investigation without blocking deployment, and an error threshold that blocks the merge entirely.

Resources:

Real User Monitoring (RUM) vs. Synthetic Testing

Synthetic testing (Lighthouse, WebPageTest) gives you controlled, repeatable results from lab environments — great for catching regressions, debugging, and competitive benchmarking. But lab data doesn’t tell you what real users experience. CI machines have gigabit networks; your users may be on a congested 4G connection in Jakarta. RUM captures performance data from actual user sessions across every device, network, geography, and browser, providing the field data that CrUX aggregates and Google uses for search ranking decisions.

The ideal setup uses both. Synthetic tests in CI catch regressions before they ship. RUM in production confirms whether those lab results hold up in the real world. A critical lesson from DebugBear’s 100 site reviews: don’t chase Lighthouse scores when your CrUX field data tells a different story. Lab TBT and field INP can diverge dramatically because real users interact with your page differently than a Lighthouse bot, and third-party scripts that run in production (analytics, ads, consent prompts) are often absent in lab tests.

Implementing basic RUM is straightforward with Google’s open-source web-vitals JavaScript library. A minimal setup collects LCP, INP, and CLS from every page view and sends them to your analytics backend via navigator.sendBeacon(). The library’s v4+ release includes LoAF attribution data for INP, giving you script-level detail on what caused slow interactions. For teams that want out-of-the-box dashboards without building their own, commercial options span a wide range.

Resources:

The RUM Tool Landscape in 2026

The RUM ecosystem has matured significantly. Here is a categorized overview of the leading options:

Developer-first / Error + Performance: Sentry has evolved from a pure error tracker into a comprehensive observability platform. Its RUM captures Core Web Vitals, traces slow page loads to backend spans, and provides session replays that show exactly what the user was doing during a performance issue. The Seer AI debugging agent can perform root cause analysis. Its webvitals.com tool, launched in late 2025, provides a quick public-facing CWV analysis. Free tiers are generous, though Web Vitals monitoring is gated to Business/Enterprise plans.

Performance-specialist tools: SpeedCurve combines RUM with synthetic monitoring and is purpose-built for performance engineers who need to correlate budgets with field data. DebugBear provides deep Core Web Vitals debugging with INP breakdowns by script URL and domain, LoAF integration, and competitive benchmarking. Both are excellent for teams whose primary focus is web performance rather than general-purpose observability.

Platform-integrated: Vercel Analytics provides built-in RUM for sites deployed on Vercel, with automatic Core Web Vitals tracking and a clean dashboard. Cloudflare Web Analytics offers free, privacy-first RUM (no cookies, no personal data) for any site on Cloudflare. Both are dead simple to enable but offer less diagnostic depth than dedicated tools.

Enterprise observability: Datadog, New Relic, and Dynatrace offer full-stack RUM that correlates frontend metrics with backend traces, logs, and infrastructure. These are ideal when you need to trace a slow LCP all the way to a database query or CDN cache miss, but they come with enterprise pricing and steeper learning curves. New Relic was named a Leader in the 2025 Gartner Magic Quadrant for Digital Experience Monitoring.

Open-source / self-hosted: SigNoz (OpenTelemetry-native), PostHog (combines RUM with A/B testing and feature flags), and BasicRUM (preparing a full open-source release in Q1 2026, announced at PerfPlanet 2025) offer self-hosted options for teams with data residency requirements or cost constraints.

Resources:

Continuous Monitoring & Alerting

Set up continuous monitoring that combines passive (RUM) and active (synthetic) approaches. For synthetic, schedule regular Lighthouse or WebPageTest runs against your key pages and track metrics over time — SpeedCurve, Calibre, and DebugBear all provide this with historical charting and regression alerts. For RUM, configure alerts on P75 metric degradation: if your production LCP crosses 2.5s at the 75th percentile, or INP crosses 200ms, you want to know immediately — not in a monthly performance review.

The 2025 PerfPlanet article on performance reporting emphasizes that a great report tells a story, not just numbers. Structure your dashboards to answer three questions: What happened? (which metric regressed), Why did it happen? (correlated with a deployment, feature flag, or third-party change), and What should we do? (actionable recommendations). T-Mobile’s approach of building Looker Studio dashboards accessible to all staff, combined with a performance wiki, is a model worth emulating.

One emerging concern highlighted by the 2025 RUMCG review: AI bot traffic is increasingly hitting websites. These automated visits are not real users, and if your RUM isn’t filtering them out, they may skew your metrics — particularly if the bots interact with elements differently than humans do. Ensure your RUM setup can distinguish bot traffic.

Resources: