Sunday, July 13, 2014

WebGL and Performance 3 - First Improvements

As suggested by the last post, this first improvement is trivial but useful.  I think from here on out I will reverse the order of the posts and talk first about the hypothesis for improvement, the changes, and wrap up with the test results.

Hypothesis for this update

In the last post we observed that although we shared the vertex buffer, program and handles with all circles (rather than create a new one for each), we left the rendering code in each circle re-setting many uniform and GL state values that only need to be set once per frame (assuming we render all circles in a batch--that is, with nothing else interceding).

We hypothesized that removing the redundant GL API calls would improve performance, so we planned to move them to a Circle.prepare() method that we would call just once at the start of each frame.

Code changes

The Circle.js class now has a new static method Circle.prepare() that contains many calls removed from the render() method.  Also, the width and height parameters are removed from render() and used in prepare().

function Circle(gl, x, y, mass, r, g, b) {
 this.pos = new Vec(x,y); // Position, in pixel coords with (0,0) at center.
 this.vel = new Vec(0,0); // Velocity (in pixels per second).

 this.mass = mass;   // Abritrary measure; effects circle size.
 this.r = r;
 this.g = g;
 this.b = b;
 this.selected = false;  // Not used at this time.

 // Only create vertex buffer, load program and collect handles once.
 // Store them in static state.
 //
 if( !Circle.program ) {

  Circle.geoBuffer = gl.createBuffer();
  gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);

  gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
   -1, -1,
    1, -1,
   -1,  1,
   -1,  1,
    1, -1,
    1,  1]), gl.STATIC_DRAW);

  Circle.program = laodShaderProgram("circle", "circle");

  gl.useProgram(Circle.program);

  // Vertex attribute handle.
  Circle.geoHandle = gl.getAttribLocation(Circle.program, "a_position");
  // Resolution handle. Resolution is a vec2 in pixels.
  Circle.resHandle = gl.getUniformLocation(Circle.program, "u_resolution");
  // Position handle. Position is the center of the circle in pixel coords.
  Circle.posHandle = gl.getUniformLocation(Circle.program, "u_pos");
  // World Size handle. World Size is a vec2 containing the pixel width/height of the context.
  Circle.worldSizeHandle = gl.getUniformLocation(Circle.program, "u_worldSize");
  // Color is the [RGB] color of the circle.
  Circle.colorHandle = gl.getUniformLocation(Circle.program, "u_color");
 }
}

Circle.prepare = function(gl, width, height) {
 gl.useProgram(Circle.program);

 gl.enable(gl.BLEND);
 gl.blendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA);

 gl.useProgram(Circle.program);

 gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);
 gl.vertexAttribPointer(Circle.geoHandle, 2, gl.FLOAT, false, 0, 0);
 gl.enableVertexAttribArray(Circle.geoHandle);
 gl.uniform2f(Circle.worldSizeHandle, width, height);
}


Circle.prototype.render = function(gl) {
 // Pipeline state setup.
 //
 gl.uniform2f(Circle.resHandle, this.mass * 300, this.mass * 300);
 gl.uniform2f(Circle.posHandle, this.pos.x, this.pos.y);
 gl.uniform3f(Circle.colorHandle, this.r, this.g, this.b);

 // Draw.
 //
 gl.drawArrays(gl.TRIANGLES, 0, 6);

}


The script in default.html has one change: the addition of a call to Circle.prepare() just prior to rendering the circles.

 ...
 // Clear the screen.
 // Note: we're not using a back or stencil buffer, so we only clear the color pixels.
 //
 gl.clear(gl.COLOR_BUFFER_BIT);

 Circle.prepare(gl, canvas.width, canvas.height);

 // Render each circle.
 //
 for (var i = circles.length - 1; i >= 0; i--) {
   circles[i].render(gl);
 };
 ...

Results

Performance results are consistent with expectations, as discussed below.

Test set 1: requestAnimationFrame()

# CirclesPhysicsFirefoxOperaChromeFirefoxOperaChrome
50No602325Difference000
Yes6026250+3+2
500No602325000
Yes602725+10+4+7

Again, FireFox amazed me, being the only browser to perform well under all tests.   With our small change, we picked up those last 10 frames on the hardest test.

However, overall, these are very small improvements, and we would expect that to be the case when using requestAnimationFrame(), as it is intended to let the browser choose the best refresh rate.  Opera and Chrome's numbers are so low given the work level that I strongly suspect their frame scheduling logic does not scale well to UHD resolutions.

Test set 2: postMessage()

# CirclesPhysicsFirefoxOperaChromeFirefoxOperaChrome
50No148050004500Difference+350+3150+2600
Yes148028402900+455+1340+1875
500No600490500+427+280+300
Yes684242+150+2

These results are something else entirely; we more than doubled some of our frame rates.

This seems ridiculous, but considering how simple our rendering is, adding a per-circle state change to the GL pipeline can easily cripple performance, and what we did here was remove that.

Of course even when we are rendering at 5000 FPS (Opera), the browser still gets to decide how often to show our masterpiece to the user, and that will be somewhere between 1 and 60 FPS.  Nonetheless, these numbers show us how quickly our frames can be rendered on the GPU using this technique.

Conclusion, and what's next

For those of you who haven't noticed, what we are building is ironically close to a particle system: a collection of physically interacting objects that we draw on the screen. Particle systems are common in games and other programs, and most often run entirely on the GPU--that means both the rendering and the physics are computed on the GPU.

However, I want to leave our CPU involved, based upon the assumption that at some point our circles will be user-interactive and/or dynamically updated from outside our simulation.

Given that, the next change I would like to make is to move our physics to a separate thread (via a WebWorker), freeing up the main thread somewhat in the process.  I expect this to have little to no impact on FPS using the postMessage() technique, but a moderate impact on the requestAnimationFrame() technique (at least for Opera and Chrome).

Stay tuned!

I almost forgot: the current demo is here:  http://experiments.uhdcoder.com/circles2

No comments:

Post a Comment