WebGPU and on-device AI: The browser is becoming a serious runtime
· 7 min read · Filippo Spinella · WebGPU, AI, Frontend, Web Performance
For years the browser was the nice face of the application and the cloud was the place where the hard stuff happened. The user writes, clicks, uploads a file; the frontend sends everything to the server; the server calls a model; the answer comes back.
This scheme remains very useful, but it is not free. Every call brings latency, cost, network dependency, and privacy questions. If the user is writing a sentence and wants a suggestion, half a second weighs. If you're classifying thousands of small inputs, pennies become real money. If the text is sensitive, sending it off the device is not a neutral choice.
That's why WebGPU and on-device AI are in hype. Not because we'll run every model in the browser tomorrow. Because some of the intelligent features can get closer to the user.
Not everything has to become local
The childish version of the argument is: "cloud versus device". The useful version is hybrid.
Some tasks look great on the device: short summaries, language detection, light rewrites, simple classifications, image filters, small vision models, creative experiences with immediate feedback.
Other tasks remain better in the cloud: complex reasoning, frontier models, server-side data, centralized audit, uniform quality, workflow where you have to carefully control each step.
The healthy architecture decides at runtime:
The browser doesn't have to win against the cloud. It must save the cloud from doing work that doesn't need to be done there.
Why WebGPU matters
WebGPU is a modern API for using the GPU from the browser. It's not just for nicer 3D graphics. It is also important because it exposes primitives suitable for computing: parallel workloads, shaders, pipelines closer to what GPUs do well.
For AI, scientific visualization, 3D editors, video filters and creative tools, this difference is felt. WebGL has done a lot for the web, but WebGPU was born with a model better suited to the present.
The first thing to write, however, is not a shader. It's a sober feature detection:
export async function requestWebGpuDevice() { if (!('gpu' in navigator)) { return null; } const adapter = await navigator.gpu.requestAdapter({ powerPreference: 'high-performance', }); if (!adapter) { return null; } return adapter.requestDevice(); }
This feature says one important thing: WebGPU is not a right granted on every device. It is an ability to be verified. Some browsers don't fully support it, some GPUs have limitations, some enterprise environments disable features, some users are on modest hardware.
Built-in AI: when the browser brings the model
Chrome is pushing built-in APIs for tasks like local prompts, summarization, writing, rewriting, translation, language detection, and proofreading. The idea is very interesting: the browser manages model, availability and updates; the app uses an API closer to the platform.
If it works well, it changes a lot:
- fewer server calls for simple tasks;
- data that may remain on the device;
- lower latency;
- offline or semi-offline experiences;
- More natural UX for writing and translation.
But it should be treated as progressive enhancement. Some APIs are stable, others in origin trial or preview, others still depend on version, language and device.
type LocalCapability = 'available' | 'downloadable' | 'unsupported'; export async function getLocalSummarizerCapability(): Promise<LocalCapability> { const SummarizerApi = (globalThis as any).Summarizer; if (!SummarizerApi?.availability) { return 'unsupported'; } const availability = await SummarizerApi.availability(); if (availability === 'available') return 'available'; if (availability === 'downloadable') return 'downloadable'; return 'unsupported'; }
The specific code will change, but the pattern remains: you check availability, explain any downloads, offer fallbacks, and measure quality.
Fallback is not a sad plan B
Cloud fallback is not a defeat. It is part of the product.
interface AiRequest { task: 'summarize' | 'rewrite' | 'classify'; input: string; } interface AiResult { output: string; runtime: 'local' | 'cloud'; } export async function runAiTask(request: AiRequest): Promise<AiResult> { const local = await tryLocalAi(request); if (local) { return { output: local, runtime: 'local' }; } const cloud = await fetch('/api/ai', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify(request), }).then((res) => res.json()); return { output: cloud.output, runtime: 'cloud' }; }
This architecture allows you to progressively improve. Those with local support get better latency and privacy. Those who don't have it still use the feature. You can measure percentage of local requests, times, errors, memory, perceived quality and cost.
Without metrics, on-device AI becomes an aesthetic choice. With metrics, it becomes a product lever.
The UX of the model matters
If the browser needs to download a model, the user perceives it. Don't hide it behind a vague spinner. Better to be clear: "We prepare the model to use this function faster and even offline."
A good experience:
- shows the preparation status;
- does not block the entire page;
- allows you to continue with cloud fallback;
- avoid battery and memory surprises;
- remember the model whenever possible;
- explain the benefit in one concrete sentence.
The worst thing is a "smart" feature that appears broken because it is downloading something silently.
Privacy: better, not automatically secure
Processing data on the device can be a great advantage. A draft email, internal document, or personal note doesn't have to leave your browser to receive a suggestion.
However, local does not automatically mean safe. However, you need to think about:
- XSS;
- accidental logs;
- data saved in storage;
- prompt injection from untrusted content;
- permissions granted to the model;
- outputs used in automatic actions.
If a local model can read a web page and then fill out a form, that page can try to manipulate it. If it can call tool, confirmation is needed. If it produces structured output, it must be validated. The fact that it runs on the device reduces some privacy risks, but does not eliminate the security model.
Where it gets really interesting
The text is just the beginning. WebGPU makes web experiences credible that until recently seemed like a native app:
- complex 3D editors;
- Gaussian splatting in the browser;
- real-time video filters;
- Lightweight CAD;
- scientific visualizations;
- creative tools with instant preview;
- vision inference near the UI;
- more ambitious browser games.
Here frontend, graphics and machine learning start to mix. It's a somewhat awkward area, but also a fertile one: the browser returns to being a serious application platform, not just the place where we put forms and dashboards.
Checklist before production
Before putting an on-device feature in front of users, I would check:
- Target browsers and devices.
- Cloud fallback or elegant degradation.
- Download time and model cache.
- Memory and battery on average hardware.
- Quality compared to the cloud version.
- Privacy policy and user messages.
- Testing with hostile inputs.
- Separate metrics for local and cloud runtime.
- Plan to update or disable the template.
It is a concrete list because the problem is concrete. A slow, fragile, or opaque AI feature doesn't become better just because it runs in the browser.
The right compromise
I don't believe that the future is "everything on the device". And I don't think the cloud will remain the only reasonable place for inference, either. The most likely future is a mix: local when it improves latency, privacy, or cost; cloud when quality, updated data and centralized control are needed.
WebGPU, WebNN and built-in AI APIs do not make the browser omnipotent. They make him more adult. And for those building web products, this is huge news.