Wednesday, July 9, 2014

WebGL and Performance 3 - First measurements

A funny thing happened to me on the way to this post.

I began making performance measurements, and things were going as planned, when all of the sudden I noticed a loss of frame-rate, first in Chrome, later in Opera, but not at all in FireFox.

I suspected a driver problem, so I rebooted, but that didn't help.

I use 3, 4k monitors, and I use Windows Scaling to make some screen elements bigger. But I turned that off because I know the three browsers handle it differently and I didn't want it to interfere.

So, after much trial and error, what I discovered was that the two misbehaving browsers were reporting an artificially high number for window.innerWidth and window.innerHeight. Here were the numbers I found:

Firefox: 3840 x 2159  (this is correct, save for one missing vertical pixel)
Opera: 4266 x 2400
Chrome: 5765 x 3243

This explained the performance loss; they were rendering to buffers that were much larger than my screen!  I suspected the two browsers were trying to emulate how Apple elegantly up-scales drawing to improve visual quality, but before jumping to conclusions I decided to turn Windows Scaling back on:

Firefox: 2560 x 1439 (familiar, if you put back that missing vertical pixel, but wrong)
Opera: 4266 x 2400
Chrome: 3843 x 2162

Hmmm.  I know that Opera ignores Windows Scaling, so this was perplexing; also, why Chrome still rendering to a larger buffer?

I googled at length, and after finding nothing relevant, I concluded the most likely cause was a bug in Windows Scaling, or in a combination of that with the browsers. Perhaps changing up and down too many times corrupted some setting.

So, I uninstalled and re-installed all three.

Nirvana: they now all report the proper numbers (well, FireFox is still missing one pixel in the vertical direction, but we'll just call that the Mozilla Tax.)

And now for some numbers

Before we get to the facts, let me perform a bit of due diligence on my setup, as your mileage may vary:

ItemDescription
CPUCore i7 980x, 6-core, hyperthreaded
GPU2x AMD R9 280x, not in crossfire (one driving one monitor, one driving two)
RAM24 GB triple channel 12800 DDR3
OSWindows 8.1 Professional
Display3x Samsung U28D590D 4k
Firefox version30.0
Opera version22.0
Chrome version35.0.1916.153 m

Test procedure

For each test, I:
  1. Ran a fresh copy of the browser on an empty monitor
  2. Opened the developer console (to see the emitted FPS) and placed it on another monitor
  3. Navigated to the demo
  4. Switched to Full Screen (via the Full Screen button at top-left)
  5. Monitored the emitted FPS numbers and averaged the last five after waiting for the numbers to stabilize
For each browser, I ran two sets of tests: one using requestAnimationFrame() and the other using postMessage().  For each test set, I tested with 50 and then 500 circles, and in both cases, with and without physics enabled.  Here are my results; numbers are in frames per second.

Test set 1: requestAnimationFrame()

# CirclesPhysicsFirefoxOperaChrome
50No602325
Yes602323
500No602325
Yes502318

It seems that only FireFox works well at 4k with requestAnimationFrame().  The other two browsers perform better at lower resolutions, but I leave those tests to you; if you follow the procedure I did above, please post your results in comments!

Test set 2: postMessage()

# CirclesPhysicsFirefoxOperaChrome
50No113018501900
Yes102515001025
500No173210200
Yes534240

These numbers are intriguing to me, as the performance drops far faster than it should for the increase in geometry count; 500 quads is such a tiny number compared to what the GPU can do; I can get > 500 FPS on my game engine rendering millions of quads per frame.

Clearly, the browser is imposing some overhead I am not familiar with--and that means this exploration is going to pay off!  Well, and I know the code is pretty poorly written at the moment, but we'll fix that.

Let's walk through the code, but please stay on the path

I wanted to start with something very simple, something naive even, and that is just what I did.  Then I realized it was just a bit too naive.

I set up the canvas, stretched it to fit the window and created a render loop. I created a Circle class that would make its own quad and load its own GL assets, then created and rendered a bunch of them. I created a Physics class that would manage the physics.

This was great until the first time I tried it from a hosted location, at which point it took too long to load with a large number of circles.

Here's why: each Circle creates its own vertex buffer, loads and links its own shaders. I knew this was bad, but I wanted to start simple. However, it turned out that loading and compiling the shaders was taking too long so I already made the first obvious update.  Since all Circles use a simple quad, they can share just one; also, they all use the same shaders, so they can share that as well. Finally, because of those two shared items, they can all share handles to attributes and uniforms as well.

So, in this version of the code, the first constructed Circle creates the vertex buffer, loads and links the program and collects handles to the attributes and uniforms, then stores these all in static state so that all circles can reuse them.

And now, code.


Here is the circles.html file with everything but the main script, which we'll cover next. Just a canvas and a few imported scripts.
<!DOCTYPE html>
<html>
<head>
 <title>Untitled Page</title>
</head>


<body style="margin: 0; padding: 0; overflow: hidden;">

<button id="idFullScreen" onclick="launchFullscreen(canvas);" style="z-index:10; position: absolute;">Full-screen</button>

<canvas id="canvas" style="z-index: 1; position: absolute; left:0; top:0; width:100%; height:100%; margin: 0; padding: 0; background-color: black; ">
</canvas>

</body>

<script src="javascripts/ShaderUtils.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/Circle.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/Vec.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/Physics.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/FPSCounter.js" language="javascript" type="text/javascript"></script>

<!-- Demo script goes here -->

</html>

Next, the script.
<script language="javascript" type="text/javascript">

var canvas = null;  // Canvas object.
var gl = null;   // WebGL context.

canvas = document.getElementById("canvas");
gl =  canvas.getContext("webgl", {antialias: false, depth: false, premultipliedAlpha: true }) || 
  canvas.getContext("experimental-webgl", { antialias: false, depth: false, premultipliedAlpha: true });

// Make sure we resize the canvas and adjust the GL Viewport when the window resizes.
//
window.addEventListener('resize', resizeCanvas, false);

// Set canvas and Viewport sizes initially.
//
resizeCanvas();

// Set the background color.
//
gl.clearColor(0.3, 0.4, 0.5, 1.0);

// Create random circles.
// Note that the positioning we are using is in GL coordinates using pixels;
//  the center of the screen is at (0, 0), and the circles are positioned using pixels.
//
var circles = new Array(50);
for (var i = circles.length - 1; i >= 0; i--) {
 circles[i] = new Circle(gl, rndX(), rndY(), Math.random(), rndDarkColorComponent(), rndDarkColorComponent(), rndDarkColorComponent());
};

// Create a Physics object that will animate our circles.
//
var physics = new Physics(circles, 0, 0);

// Attach our render function as an event listener.
// Only used when we are using postMessage().
//
window.addEventListener('message', render, false);

var fps = new FPSCounter();
var mod = 0;

// Begin rendering.
//
render();

function render() {
 try {
  // We're going to emit an FPS number every 200 frames.
  //
  fps.tick();
  if( (++mod) % 200 == 0 )
   console.log("FPS: " + fps.getValue());

  // Move our circles.
  //
  physics.run();

  // Clear the screen.
  // Note: we're not using a back or stencil buffer, so we only clear the color pixels.
  //
  gl.clear(gl.COLOR_BUFFER_BIT);

  // Render each circle.
  //
  for (var i = circles.length - 1; i >= 0; i--) {
   circles[i].render(gl, canvas.width, canvas.height);
  };

  // Request the next render using with postMessage() (fast) or 
  // requestAnimationFrame (which uses the browser's chosen buffer flip rate).
  //
  //window.postMessage('', '*');
  requestAnimationFrame(render);

 } catch (e) {
  console.log(e);
 }
}

// Pick a random horizontal pixel location where the screen center is at (0).
//
function rndX() {
 return (Math.random() * 2.0 - 1.0) * window.innerWidth/2;
}

// Pick a random vertical pixel location where the screen center is at (0).
//
function rndY() {
 return (Math.random() * 2.0 - 1.0) * window.innerHeight/2;
}

// Pick a random color component in [0, 0.5).
//
function rndDarkColorComponent() {
 return Math.random() * 0.5;
}

// Resize the canvas to fill the window and reset the GL Viewport to match.
//
function resizeCanvas() {

 canvas.width = window.innerWidth;
 canvas.height = window.innerHeight;

 gl.viewport(0, 0, window.innerWidth, window.innerHeight);

 console.log("SIZE:", window.innerWidth, window.innerHeight);
}

// Ask the browser to show our canvas full-screen.
//
function launchFullscreen(element) {
  if(element.requestFullscreen) {
    element.requestFullscreen();
  } else if(element.mozRequestFullScreen) {
    element.mozRequestFullScreen();
  } else if(element.webkitRequestFullscreen) {
    element.webkitRequestFullscreen();
  } else if(element.msRequestFullscreen) {
    element.msRequestFullscreen();
  }
}

</script>

The comments largely describe the code, but a few notes are worth sharing.

Line 52 is there to stop us from getting SPAMmed with FPS measurements.  When using the slow render method I suggest anywhere from 60-200; when using the fast render method, 2000 is good.  You may also want to change the size of the averaging buffer in FPSCounter.js.

To switch from slow (requestAnimationFrame()) rendering to fast (postMessage()), comment out line 74 and un-comment line 73.  Do not leave both of these un-commented, as you will damage the space-time continuum.

Notice on line 108 I left the log output to show the window's inner size; this is so I will notice if my browser(s) get that wrong again.

ShaderUtil.js, Vec.js, FPSCounter.js

I am not going to cover these, as they are rather trivial.

Circle.js

Each instance of this class represents one visible circle. The first instance created loads and prepares all shared assets and stores them in static state.

function Circle(gl, x, y, mass, r, g, b) {
 this.pos = new Vec(x,y); // Position, in pixel coords with (0,0) at center.
 this.vel = new Vec(0,0); // Velocity (in pixels per second).

 this.mass = mass;   // Abritrary measure; effects circle size.
 this.r = r;
 this.g = g;
 this.b = b;
 this.selected = false;  // Not used at this time.

 // Only create vertex buffer, load program and collect handles once.
 // Store them in static state.
 //
 if( !Circle.program ) {

  Circle.geoBuffer = gl.createBuffer();
  gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);

  gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
   -1, -1,
    1, -1,
   -1,  1,
   -1,  1,
    1, -1,
    1,  1]), gl.STATIC_DRAW);

  Circle.program = laodShaderProgram("circle", "circle");

  gl.useProgram(Circle.program);

  // Vertex attribute handle.
  Circle.geoHandle = gl.getAttribLocation(Circle.program, "a_position");
  // Resolution handle. Resolution is a vec2 in pixels.
  Circle.resHandle = gl.getUniformLocation(Circle.program, "u_resolution");
  // Position handle. Position is the center of the circle in pixel coords.
  Circle.posHandle = gl.getUniformLocation(Circle.program, "u_pos");
  // World Size handle. World Size is a vec2 containing the pixel width/height of the context.
  Circle.worldSizeHandle = gl.getUniformLocation(Circle.program, "u_worldSize");
  // Color is the [RGB] color of the circle.
  Circle.colorHandle = gl.getUniformLocation(Circle.program, "u_color");
 }
}


Circle.prototype.render = function(gl, width, height) {
 // Pipeline state setup.
 //
 gl.useProgram(Circle.program);

 gl.enable(gl.BLEND);
 gl.blendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA);

 gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);
 gl.vertexAttribPointer(Circle.geoHandle, 2, gl.FLOAT, false, 0, 0);
 gl.enableVertexAttribArray(Circle.geoHandle);

 gl.uniform2f(Circle.resHandle, this.mass * 300, this.mass * 300);
 gl.uniform2f(Circle.posHandle, this.pos.x, this.pos.y);
 gl.uniform2f(Circle.worldSizeHandle, width, height);
 gl.uniform3f(Circle.colorHandle, this.r, this.g, this.b);

 // Draw.
 //
 gl.drawArrays(gl.TRIANGLES, 0, 6);

}

On line 14 we begin the one-time asset load and setup. However, although I made this code only execute once, I left the other inefficiencies inside render().

In most OpenGL programs, it is best to batch geometry by the shaders they use or other attributes, so that you can make just a few GL calls to set up for a large amount of drawing.  What we have in this experiment is only one type of rendering, so it should all be batched in this way.  In fact, lines like 48 through 55 could be called just once at the beginning of the program and never again; they will set up the GL context state and it will stay that way. However, that is bad practice, as the moment you add something else--perhaps a logo or some other graphic that changes the state--the circles would not render.

We will use a more robust approach: create a static Circle.prepare() method that prepares for rendering circles, then we will just call that once per frame.  It will contain lines 48 through 55, as well as line 59.  That will be seven GL calls per circle that we can skip, which should improve performance.

You might suspect that the GL driver should recognize when we make redundant calls (such as calling gl.enable(gl.BLEND) over and over)--and some may--but in general the lower level an API is and the higher its performance, the fewer safety checks it will do for you, as all those extra if statements would add up and slow down everybody's programs.

Physics.js


// Construct Physics with array of physical objects, and an origin,
// to which all objects will be attracted linearly.
//
function Physics(obs, oX, oY) {
 this.obs = obs;
 this.center = new Vec(oX, oY);
 this.lastrun = performance.now();
}

Physics.prototype.run = function() {
 var sk = 1;   // Spring constant
 var rk = 100000; // Repulsive constant

 var now = performance.now();

 // Compute the delta time for this update.
 // NOTE: we are fixing the value to 16ms for stability and simplicity. More on this later.
 //var dt = (now - this.lastrun) / 1000;
 var dt = 0.016;

 // Walk through all objects and update just their velocity from forces.
 //
 for(var i=0 ; i<this.obs.length ; ++i) {
  var ob = this.obs[i];

  // Spring contribution to velocity (eventually).
  //
  var sv = ob.pos.copy();
  sv.sub(this.center);
  
  var len = sv.len();
  var f = -sk * len;
  sv.normalize();
  var a = f / ob.mass;
  sv.mul(a * dt);

  // Repulisve contribution to velocity.
  //
  var tv = new Vec(0,0);
  var rv = new Vec(0,0);

  // Compute additive repulsion from all other circles.
  //
  for(var j=0 ; j<this.obs.length ; ++j) {
   if( i != j ) { // Ignore self.
    rv.initVec(ob.pos);
    rv.sub(this.obs[j].pos);
    len = rv.len();
    rv.normalize();
    a = (rk * this.obs[j].mass) / (len*len);
    rv.mul(a);
    tv.add(rv);
   }
  }

  // Add the velocities.
  //
  ob.vel.add(sv);
  ob.vel.add(tv);

  // Damping force.
  //
  ob.vel.mul(0.5 + ob.mass/2);

  // Manual damper. This avoids exposive simulations on systems with
  // unexpected specs/performance.
  //
  if( ob.vel.len() > 100 ) {
   ob.vel.normalize().mul(100);
  }
 }

 // Now that all velocities are computed, update positions.
 //
 for(var i=0 ; i<this.obs.length ; ++i) {
  this.obs[i].pos.add(this.obs[i].vel);
 }

 // Remember when we did this iteration.
 //
 this.lastrun = now;
}


There isn't much I can say about this; if you're new to such simulations, play with the constants at the top.

Conclusion, and what's next

We have a basic demo working, and we have reliable performance numbers for our baseline.

Our Circle's render() method is inefficient because it makes a lot of redundant GL calls; we will remove those by adding a static Circle.prepare() method to make these calls just once per frame.

1 comment:

  1. I look forward to seeing how far you go!

    ReplyDelete