You don't need webgpu for that. It's a standard vertex shader -> fragment shader...

MindSpunk · 2026-01-21T19:31:27 1769023887

Drawing lots of single pixels with alpha blending is probably one of the least efficient ways to use the rasterizer though. A good compute shader implementation would be substantially faster.

akomtu · 2026-01-21T20:11:10 1769026270

At 1M points it hardly makes a difference. Besides, 1 point -> 1 pixel mapping is good enough for a demo, but in practice it will produce nasty aliasing artifacts because real datasets aren't aligned with pixel coordinates. So you have to draw each point as a 2x2 square at least with precise shading, and we are back to the rasterizer pipeline. Edit: what actually needs to be computed is the integral of the points dataset over each square pixel, and that depends on the shape of each point, even if it's smaller than a pixel.

dheera · 2026-01-21T22:31:42 1769034702

Aren't we at petaflops now with GPUs? 1M or even 1G points should be no issue if it renders to a framebuffer and doesn't go through mountains af JS framework rubbish followed by mountains of GTK/Qt/.NET rubbish.

rustystump · 2026-01-22T04:29:00 1769056140

Not true. Fill rate and memory speed is still a huge bottleneck. The issue is not “rubbish” but memory speed. It is almost always memory speed, cache, ram, disk etc.

There is this misconception that if one uses js or c# to tell a gpu what to do it is somehow slower than rust. It only is if you crunching data but moving memory to the gpu and telling gpu to crunch is virtually identical.

dheera · 2026-01-22T05:47:29 1769060849

PCIe 6.0 x16 delivers ~128 GB/s so the billion points can be loaded in milliseconds onto the GPU. The GPU's memory is much faster.

rustystump · 2026-01-22T08:23:34 1769070214

Most consumers dont have that and at 60 fps you are already maxing it out and more assuming os is doing nothing else. Bandwidth even on gpus is still the bottleneck.

Even then, when u write to a framebuffer directly in the gpu if the locations of the points are not contiguous you are thrashing. Rendering points very fast is still very much about reducing the data set down to bypass all the layers of memory walls.

dapperdrake · 2026-01-22T13:49:39 1769089779

No difference for human visuals or no difference for discrete data or no difference for "continuous" f32 data?

vanderZwan · 2026-01-21T19:28:14 1769023694

That works if more overdraw = more intensity is all you care about, and may very well be good enough for many kinds of charts. But with heat map plots one usually wants a proper mapping of some intensity domain to a color map and a legend with a color gradient that tells you which color represents which value. Which requires binning, counting per bin, and determining the min and max values.

akomtu · 2026-01-21T20:06:12 1769025972

Emm.. no, you just do one render pass to a temp framebuffer with 1 red channel, then another fragment shader maps it to an RGB palette.

vanderZwan · 2026-01-22T09:24:41 1769073881

Wait, does additional blending let you draw to temp framebuffers with high precision and without clamping? Even so you'd still need to know the maximum value of the temp framebuffer though.

akomtu · 2026-01-22T21:08:28 1769116108

That's what EXT_float_blend does. It's true, though, that you can't find the global min/max in webgl2. This could be done, theoretically, with mipmaps if only those mipmaps supported the max function.

vanderZwan · 2026-01-23T12:23:45 1769171025

Couldn't you do that manually with a simple downscaling filter? I'd be very shocked if fragment shaders did not have a min or max function.

Repeatedly shrinking by a factor of two means log2(max(width, height)) passes, each pass is a quarter of the pixels of the previous pass so that's a total of 4/3 times the pixels of the original image. Should be low enough overhead, right?

akomtu · 2026-01-23T19:24:23 1769196263

Sure, that will work, but it's log2 passes + temp framebuffers. As for overhead, I'm afraid it will eat a couple fps if you run it on every frame. In practice, though, I'm not sure that finding the exact maximum is that valuable for rendering: a good guess based on the dataset type will do. For example, if you need to render N points that tend to congregate in the center, using sqrt(N) as the heuristic for the maximum works very well.