Sunday, July 13, 2014

WebGL and Performance 3 - First Improvements

As suggested by the last post, this first improvement is trivial but useful.  I think from here on out I will reverse the order of the posts and talk first about the hypothesis for improvement, the changes, and wrap up with the test results.

Hypothesis for this update

In the last post we observed that although we shared the vertex buffer, program and handles with all circles (rather than create a new one for each), we left the rendering code in each circle re-setting many uniform and GL state values that only need to be set once per frame (assuming we render all circles in a batch--that is, with nothing else interceding).

We hypothesized that removing the redundant GL API calls would improve performance, so we planned to move them to a Circle.prepare() method that we would call just once at the start of each frame.

Code changes

The Circle.js class now has a new static method Circle.prepare() that contains many calls removed from the render() method.  Also, the width and height parameters are removed from render() and used in prepare().

function Circle(gl, x, y, mass, r, g, b) {
 this.pos = new Vec(x,y); // Position, in pixel coords with (0,0) at center.
 this.vel = new Vec(0,0); // Velocity (in pixels per second).

 this.mass = mass;   // Abritrary measure; effects circle size.
 this.r = r;
 this.g = g;
 this.b = b;
 this.selected = false;  // Not used at this time.

 // Only create vertex buffer, load program and collect handles once.
 // Store them in static state.
 //
 if( !Circle.program ) {

  Circle.geoBuffer = gl.createBuffer();
  gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);

  gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
   -1, -1,
    1, -1,
   -1,  1,
   -1,  1,
    1, -1,
    1,  1]), gl.STATIC_DRAW);

  Circle.program = laodShaderProgram("circle", "circle");

  gl.useProgram(Circle.program);

  // Vertex attribute handle.
  Circle.geoHandle = gl.getAttribLocation(Circle.program, "a_position");
  // Resolution handle. Resolution is a vec2 in pixels.
  Circle.resHandle = gl.getUniformLocation(Circle.program, "u_resolution");
  // Position handle. Position is the center of the circle in pixel coords.
  Circle.posHandle = gl.getUniformLocation(Circle.program, "u_pos");
  // World Size handle. World Size is a vec2 containing the pixel width/height of the context.
  Circle.worldSizeHandle = gl.getUniformLocation(Circle.program, "u_worldSize");
  // Color is the [RGB] color of the circle.
  Circle.colorHandle = gl.getUniformLocation(Circle.program, "u_color");
 }
}

Circle.prepare = function(gl, width, height) {
 gl.useProgram(Circle.program);

 gl.enable(gl.BLEND);
 gl.blendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA);

 gl.useProgram(Circle.program);

 gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);
 gl.vertexAttribPointer(Circle.geoHandle, 2, gl.FLOAT, false, 0, 0);
 gl.enableVertexAttribArray(Circle.geoHandle);
 gl.uniform2f(Circle.worldSizeHandle, width, height);
}


Circle.prototype.render = function(gl) {
 // Pipeline state setup.
 //
 gl.uniform2f(Circle.resHandle, this.mass * 300, this.mass * 300);
 gl.uniform2f(Circle.posHandle, this.pos.x, this.pos.y);
 gl.uniform3f(Circle.colorHandle, this.r, this.g, this.b);

 // Draw.
 //
 gl.drawArrays(gl.TRIANGLES, 0, 6);

}


The script in default.html has one change: the addition of a call to Circle.prepare() just prior to rendering the circles.

 ...
 // Clear the screen.
 // Note: we're not using a back or stencil buffer, so we only clear the color pixels.
 //
 gl.clear(gl.COLOR_BUFFER_BIT);

 Circle.prepare(gl, canvas.width, canvas.height);

 // Render each circle.
 //
 for (var i = circles.length - 1; i >= 0; i--) {
   circles[i].render(gl);
 };
 ...

Results

Performance results are consistent with expectations, as discussed below.

Test set 1: requestAnimationFrame()

# CirclesPhysicsFirefoxOperaChromeFirefoxOperaChrome
50No602325Difference000
Yes6026250+3+2
500No602325000
Yes602725+10+4+7

Again, FireFox amazed me, being the only browser to perform well under all tests.   With our small change, we picked up those last 10 frames on the hardest test.

However, overall, these are very small improvements, and we would expect that to be the case when using requestAnimationFrame(), as it is intended to let the browser choose the best refresh rate.  Opera and Chrome's numbers are so low given the work level that I strongly suspect their frame scheduling logic does not scale well to UHD resolutions.

Test set 2: postMessage()

# CirclesPhysicsFirefoxOperaChromeFirefoxOperaChrome
50No148050004500Difference+350+3150+2600
Yes148028402900+455+1340+1875
500No600490500+427+280+300
Yes684242+150+2

These results are something else entirely; we more than doubled some of our frame rates.

This seems ridiculous, but considering how simple our rendering is, adding a per-circle state change to the GL pipeline can easily cripple performance, and what we did here was remove that.

Of course even when we are rendering at 5000 FPS (Opera), the browser still gets to decide how often to show our masterpiece to the user, and that will be somewhere between 1 and 60 FPS.  Nonetheless, these numbers show us how quickly our frames can be rendered on the GPU using this technique.

Conclusion, and what's next

For those of you who haven't noticed, what we are building is ironically close to a particle system: a collection of physically interacting objects that we draw on the screen. Particle systems are common in games and other programs, and most often run entirely on the GPU--that means both the rendering and the physics are computed on the GPU.

However, I want to leave our CPU involved, based upon the assumption that at some point our circles will be user-interactive and/or dynamically updated from outside our simulation.

Given that, the next change I would like to make is to move our physics to a separate thread (via a WebWorker), freeing up the main thread somewhat in the process.  I expect this to have little to no impact on FPS using the postMessage() technique, but a moderate impact on the requestAnimationFrame() technique (at least for Opera and Chrome).

Stay tuned!

I almost forgot: the current demo is here:  http://experiments.uhdcoder.com/circles2

Wednesday, July 9, 2014

WebGL and Performance 3 - First measurements

A funny thing happened to me on the way to this post.

I began making performance measurements, and things were going as planned, when all of the sudden I noticed a loss of frame-rate, first in Chrome, later in Opera, but not at all in FireFox.

I suspected a driver problem, so I rebooted, but that didn't help.

I use 3, 4k monitors, and I use Windows Scaling to make some screen elements bigger. But I turned that off because I know the three browsers handle it differently and I didn't want it to interfere.

So, after much trial and error, what I discovered was that the two misbehaving browsers were reporting an artificially high number for window.innerWidth and window.innerHeight. Here were the numbers I found:

Firefox: 3840 x 2159  (this is correct, save for one missing vertical pixel)
Opera: 4266 x 2400
Chrome: 5765 x 3243

This explained the performance loss; they were rendering to buffers that were much larger than my screen!  I suspected the two browsers were trying to emulate how Apple elegantly up-scales drawing to improve visual quality, but before jumping to conclusions I decided to turn Windows Scaling back on:

Firefox: 2560 x 1439 (familiar, if you put back that missing vertical pixel, but wrong)
Opera: 4266 x 2400
Chrome: 3843 x 2162

Hmmm.  I know that Opera ignores Windows Scaling, so this was perplexing; also, why Chrome still rendering to a larger buffer?

I googled at length, and after finding nothing relevant, I concluded the most likely cause was a bug in Windows Scaling, or in a combination of that with the browsers. Perhaps changing up and down too many times corrupted some setting.

So, I uninstalled and re-installed all three.

Nirvana: they now all report the proper numbers (well, FireFox is still missing one pixel in the vertical direction, but we'll just call that the Mozilla Tax.)

And now for some numbers

Before we get to the facts, let me perform a bit of due diligence on my setup, as your mileage may vary:

ItemDescription
CPUCore i7 980x, 6-core, hyperthreaded
GPU2x AMD R9 280x, not in crossfire (one driving one monitor, one driving two)
RAM24 GB triple channel 12800 DDR3
OSWindows 8.1 Professional
Display3x Samsung U28D590D 4k
Firefox version30.0
Opera version22.0
Chrome version35.0.1916.153 m

Test procedure

For each test, I:
  1. Ran a fresh copy of the browser on an empty monitor
  2. Opened the developer console (to see the emitted FPS) and placed it on another monitor
  3. Navigated to the demo
  4. Switched to Full Screen (via the Full Screen button at top-left)
  5. Monitored the emitted FPS numbers and averaged the last five after waiting for the numbers to stabilize
For each browser, I ran two sets of tests: one using requestAnimationFrame() and the other using postMessage().  For each test set, I tested with 50 and then 500 circles, and in both cases, with and without physics enabled.  Here are my results; numbers are in frames per second.

Test set 1: requestAnimationFrame()

# CirclesPhysicsFirefoxOperaChrome
50No602325
Yes602323
500No602325
Yes502318

It seems that only FireFox works well at 4k with requestAnimationFrame().  The other two browsers perform better at lower resolutions, but I leave those tests to you; if you follow the procedure I did above, please post your results in comments!

Test set 2: postMessage()

# CirclesPhysicsFirefoxOperaChrome
50No113018501900
Yes102515001025
500No173210200
Yes534240

These numbers are intriguing to me, as the performance drops far faster than it should for the increase in geometry count; 500 quads is such a tiny number compared to what the GPU can do; I can get > 500 FPS on my game engine rendering millions of quads per frame.

Clearly, the browser is imposing some overhead I am not familiar with--and that means this exploration is going to pay off!  Well, and I know the code is pretty poorly written at the moment, but we'll fix that.

Let's walk through the code, but please stay on the path

I wanted to start with something very simple, something naive even, and that is just what I did.  Then I realized it was just a bit too naive.

I set up the canvas, stretched it to fit the window and created a render loop. I created a Circle class that would make its own quad and load its own GL assets, then created and rendered a bunch of them. I created a Physics class that would manage the physics.

This was great until the first time I tried it from a hosted location, at which point it took too long to load with a large number of circles.

Here's why: each Circle creates its own vertex buffer, loads and links its own shaders. I knew this was bad, but I wanted to start simple. However, it turned out that loading and compiling the shaders was taking too long so I already made the first obvious update.  Since all Circles use a simple quad, they can share just one; also, they all use the same shaders, so they can share that as well. Finally, because of those two shared items, they can all share handles to attributes and uniforms as well.

So, in this version of the code, the first constructed Circle creates the vertex buffer, loads and links the program and collects handles to the attributes and uniforms, then stores these all in static state so that all circles can reuse them.

And now, code.


Here is the circles.html file with everything but the main script, which we'll cover next. Just a canvas and a few imported scripts.
<!DOCTYPE html>
<html>
<head>
 <title>Untitled Page</title>
</head>


<body style="margin: 0; padding: 0; overflow: hidden;">

<button id="idFullScreen" onclick="launchFullscreen(canvas);" style="z-index:10; position: absolute;">Full-screen</button>

<canvas id="canvas" style="z-index: 1; position: absolute; left:0; top:0; width:100%; height:100%; margin: 0; padding: 0; background-color: black; ">
</canvas>

</body>

<script src="javascripts/ShaderUtils.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/Circle.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/Vec.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/Physics.js" language="javascript" type="text/javascript"></script>
<script src="javascripts/FPSCounter.js" language="javascript" type="text/javascript"></script>

<!-- Demo script goes here -->

</html>

Next, the script.
<script language="javascript" type="text/javascript">

var canvas = null;  // Canvas object.
var gl = null;   // WebGL context.

canvas = document.getElementById("canvas");
gl =  canvas.getContext("webgl", {antialias: false, depth: false, premultipliedAlpha: true }) || 
  canvas.getContext("experimental-webgl", { antialias: false, depth: false, premultipliedAlpha: true });

// Make sure we resize the canvas and adjust the GL Viewport when the window resizes.
//
window.addEventListener('resize', resizeCanvas, false);

// Set canvas and Viewport sizes initially.
//
resizeCanvas();

// Set the background color.
//
gl.clearColor(0.3, 0.4, 0.5, 1.0);

// Create random circles.
// Note that the positioning we are using is in GL coordinates using pixels;
//  the center of the screen is at (0, 0), and the circles are positioned using pixels.
//
var circles = new Array(50);
for (var i = circles.length - 1; i >= 0; i--) {
 circles[i] = new Circle(gl, rndX(), rndY(), Math.random(), rndDarkColorComponent(), rndDarkColorComponent(), rndDarkColorComponent());
};

// Create a Physics object that will animate our circles.
//
var physics = new Physics(circles, 0, 0);

// Attach our render function as an event listener.
// Only used when we are using postMessage().
//
window.addEventListener('message', render, false);

var fps = new FPSCounter();
var mod = 0;

// Begin rendering.
//
render();

function render() {
 try {
  // We're going to emit an FPS number every 200 frames.
  //
  fps.tick();
  if( (++mod) % 200 == 0 )
   console.log("FPS: " + fps.getValue());

  // Move our circles.
  //
  physics.run();

  // Clear the screen.
  // Note: we're not using a back or stencil buffer, so we only clear the color pixels.
  //
  gl.clear(gl.COLOR_BUFFER_BIT);

  // Render each circle.
  //
  for (var i = circles.length - 1; i >= 0; i--) {
   circles[i].render(gl, canvas.width, canvas.height);
  };

  // Request the next render using with postMessage() (fast) or 
  // requestAnimationFrame (which uses the browser's chosen buffer flip rate).
  //
  //window.postMessage('', '*');
  requestAnimationFrame(render);

 } catch (e) {
  console.log(e);
 }
}

// Pick a random horizontal pixel location where the screen center is at (0).
//
function rndX() {
 return (Math.random() * 2.0 - 1.0) * window.innerWidth/2;
}

// Pick a random vertical pixel location where the screen center is at (0).
//
function rndY() {
 return (Math.random() * 2.0 - 1.0) * window.innerHeight/2;
}

// Pick a random color component in [0, 0.5).
//
function rndDarkColorComponent() {
 return Math.random() * 0.5;
}

// Resize the canvas to fill the window and reset the GL Viewport to match.
//
function resizeCanvas() {

 canvas.width = window.innerWidth;
 canvas.height = window.innerHeight;

 gl.viewport(0, 0, window.innerWidth, window.innerHeight);

 console.log("SIZE:", window.innerWidth, window.innerHeight);
}

// Ask the browser to show our canvas full-screen.
//
function launchFullscreen(element) {
  if(element.requestFullscreen) {
    element.requestFullscreen();
  } else if(element.mozRequestFullScreen) {
    element.mozRequestFullScreen();
  } else if(element.webkitRequestFullscreen) {
    element.webkitRequestFullscreen();
  } else if(element.msRequestFullscreen) {
    element.msRequestFullscreen();
  }
}

</script>

The comments largely describe the code, but a few notes are worth sharing.

Line 52 is there to stop us from getting SPAMmed with FPS measurements.  When using the slow render method I suggest anywhere from 60-200; when using the fast render method, 2000 is good.  You may also want to change the size of the averaging buffer in FPSCounter.js.

To switch from slow (requestAnimationFrame()) rendering to fast (postMessage()), comment out line 74 and un-comment line 73.  Do not leave both of these un-commented, as you will damage the space-time continuum.

Notice on line 108 I left the log output to show the window's inner size; this is so I will notice if my browser(s) get that wrong again.

ShaderUtil.js, Vec.js, FPSCounter.js

I am not going to cover these, as they are rather trivial.

Circle.js

Each instance of this class represents one visible circle. The first instance created loads and prepares all shared assets and stores them in static state.

function Circle(gl, x, y, mass, r, g, b) {
 this.pos = new Vec(x,y); // Position, in pixel coords with (0,0) at center.
 this.vel = new Vec(0,0); // Velocity (in pixels per second).

 this.mass = mass;   // Abritrary measure; effects circle size.
 this.r = r;
 this.g = g;
 this.b = b;
 this.selected = false;  // Not used at this time.

 // Only create vertex buffer, load program and collect handles once.
 // Store them in static state.
 //
 if( !Circle.program ) {

  Circle.geoBuffer = gl.createBuffer();
  gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);

  gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([
   -1, -1,
    1, -1,
   -1,  1,
   -1,  1,
    1, -1,
    1,  1]), gl.STATIC_DRAW);

  Circle.program = laodShaderProgram("circle", "circle");

  gl.useProgram(Circle.program);

  // Vertex attribute handle.
  Circle.geoHandle = gl.getAttribLocation(Circle.program, "a_position");
  // Resolution handle. Resolution is a vec2 in pixels.
  Circle.resHandle = gl.getUniformLocation(Circle.program, "u_resolution");
  // Position handle. Position is the center of the circle in pixel coords.
  Circle.posHandle = gl.getUniformLocation(Circle.program, "u_pos");
  // World Size handle. World Size is a vec2 containing the pixel width/height of the context.
  Circle.worldSizeHandle = gl.getUniformLocation(Circle.program, "u_worldSize");
  // Color is the [RGB] color of the circle.
  Circle.colorHandle = gl.getUniformLocation(Circle.program, "u_color");
 }
}


Circle.prototype.render = function(gl, width, height) {
 // Pipeline state setup.
 //
 gl.useProgram(Circle.program);

 gl.enable(gl.BLEND);
 gl.blendFunc(gl.SRC_ALPHA, gl.ONE_MINUS_SRC_ALPHA);

 gl.bindBuffer(gl.ARRAY_BUFFER, Circle.geoBuffer);
 gl.vertexAttribPointer(Circle.geoHandle, 2, gl.FLOAT, false, 0, 0);
 gl.enableVertexAttribArray(Circle.geoHandle);

 gl.uniform2f(Circle.resHandle, this.mass * 300, this.mass * 300);
 gl.uniform2f(Circle.posHandle, this.pos.x, this.pos.y);
 gl.uniform2f(Circle.worldSizeHandle, width, height);
 gl.uniform3f(Circle.colorHandle, this.r, this.g, this.b);

 // Draw.
 //
 gl.drawArrays(gl.TRIANGLES, 0, 6);

}

On line 14 we begin the one-time asset load and setup. However, although I made this code only execute once, I left the other inefficiencies inside render().

In most OpenGL programs, it is best to batch geometry by the shaders they use or other attributes, so that you can make just a few GL calls to set up for a large amount of drawing.  What we have in this experiment is only one type of rendering, so it should all be batched in this way.  In fact, lines like 48 through 55 could be called just once at the beginning of the program and never again; they will set up the GL context state and it will stay that way. However, that is bad practice, as the moment you add something else--perhaps a logo or some other graphic that changes the state--the circles would not render.

We will use a more robust approach: create a static Circle.prepare() method that prepares for rendering circles, then we will just call that once per frame.  It will contain lines 48 through 55, as well as line 59.  That will be seven GL calls per circle that we can skip, which should improve performance.

You might suspect that the GL driver should recognize when we make redundant calls (such as calling gl.enable(gl.BLEND) over and over)--and some may--but in general the lower level an API is and the higher its performance, the fewer safety checks it will do for you, as all those extra if statements would add up and slow down everybody's programs.

Physics.js


// Construct Physics with array of physical objects, and an origin,
// to which all objects will be attracted linearly.
//
function Physics(obs, oX, oY) {
 this.obs = obs;
 this.center = new Vec(oX, oY);
 this.lastrun = performance.now();
}

Physics.prototype.run = function() {
 var sk = 1;   // Spring constant
 var rk = 100000; // Repulsive constant

 var now = performance.now();

 // Compute the delta time for this update.
 // NOTE: we are fixing the value to 16ms for stability and simplicity. More on this later.
 //var dt = (now - this.lastrun) / 1000;
 var dt = 0.016;

 // Walk through all objects and update just their velocity from forces.
 //
 for(var i=0 ; i<this.obs.length ; ++i) {
  var ob = this.obs[i];

  // Spring contribution to velocity (eventually).
  //
  var sv = ob.pos.copy();
  sv.sub(this.center);
  
  var len = sv.len();
  var f = -sk * len;
  sv.normalize();
  var a = f / ob.mass;
  sv.mul(a * dt);

  // Repulisve contribution to velocity.
  //
  var tv = new Vec(0,0);
  var rv = new Vec(0,0);

  // Compute additive repulsion from all other circles.
  //
  for(var j=0 ; j<this.obs.length ; ++j) {
   if( i != j ) { // Ignore self.
    rv.initVec(ob.pos);
    rv.sub(this.obs[j].pos);
    len = rv.len();
    rv.normalize();
    a = (rk * this.obs[j].mass) / (len*len);
    rv.mul(a);
    tv.add(rv);
   }
  }

  // Add the velocities.
  //
  ob.vel.add(sv);
  ob.vel.add(tv);

  // Damping force.
  //
  ob.vel.mul(0.5 + ob.mass/2);

  // Manual damper. This avoids exposive simulations on systems with
  // unexpected specs/performance.
  //
  if( ob.vel.len() > 100 ) {
   ob.vel.normalize().mul(100);
  }
 }

 // Now that all velocities are computed, update positions.
 //
 for(var i=0 ; i<this.obs.length ; ++i) {
  this.obs[i].pos.add(this.obs[i].vel);
 }

 // Remember when we did this iteration.
 //
 this.lastrun = now;
}


There isn't much I can say about this; if you're new to such simulations, play with the constants at the top.

Conclusion, and what's next

We have a basic demo working, and we have reliable performance numbers for our baseline.

Our Circle's render() method is inefficient because it makes a lot of redundant GL calls; we will remove those by adding a static Circle.prepare() method to make these calls just once per frame.

Monday, July 7, 2014

WebGL and Performance 2 - A lumpy start

Yes, I meant to say lumpy (as opposed to bumpy).

I have a basic experiment up and running, but I have not organized or cleaned the code and am not quite ready to start our performance journey.  Nonetheless, I wanted to explain my experiences thus far.

 

You will recall from the previous post that I am an experienced OpenGL programmer (as well as DirectX) using several languages (C++, Java, C#). However, I have written very little Javascript in recent years beyond just a few hours experimenting with HTML5 features (WebGL, WebSockets, WebWorkers, WebRTC).  The point of this is that I expect this to be a slow start as I re-acquaint myself with this language, environment and tools.

A bit about Native development

The first thing I did was consider the over-simplified mental model I use when programming native GL apps. A version, simplified for our purpose here looks something like this:

 

 The rendering process at this level is simple:
  1. My code sends commands to the GPU's driver
  2. The driver passes them to the GPU (after some likely re-packaging)
  3. The GPU stores my data (when I issue storage commands) and executes shaders, drawing fragments (pixels) into the back buffer
  4. When I am done with a scene, I call a method--such as glSwapBuffers, Present, or some other API-specific call that switches the to the two buffers: back becomes front, front becomes back



In this way, while my code is rendering to the back-buffer, the monitor is reading the front buffer.

Note: I keep the driver in this model only to remind myself the [occasionally] large differences between them. For example, some drivers do direct DMA transfers when you buffer data (send it to the GPU), whereas others stage it in separate memory (resulting in an extra copy); some drivers are more picky about shader syntax, etc.

Buffer swapping and V-sync

The monitor will only read the buffer contents once every time it updates it's display, which is typically every 1/60th of a second, or 1/120th on a few high end displays.  So, it is ideal to have a new scene rendered during that time and then swap buffers right before the monitor reads the next frame.

Locking rendering to the monitor's refresh rate is called v-sync,  or "vertical refresh synchronization."

However, it is often useful to render much faster than the display can show.

Why?

There are two reasons:

  1. Gamers will attest that higher FPS (Frames Per Second) often gives an edge in competitive gaming. This seems to make no sense, as the player cannot see pixels that are never shown on the display, but this isn't why it helps.  Many games are naively built with one large game loop in which physics, player movement, networking and other logic operates in lock-step with the screen updates (or at least with the frames being rendered inside the GPU).  So, turning up the frame rate improves the responsiveness of the game even though it does not help the visual experience (and in fact it can hurt it, as sometimes the buffers will be swapped while the monitor is reading a frame, resulting in visual tearing across the screen).
  2. While developing a graphics intensive app, the GPU will often be the performance bottleneck; this is the case with most popular 3D games.  Letting the GPU render frames as fast as possible gives you--the developer--a sense of changes in performance as you change your code.  For example, say my new competitor to Call of Duty is rendering on my development machine at 250 FPS, and after some optimization I see it rise to 400 FPS.  This is much easier than locking updates to 60 FPS and then using external tools to analyze GPU workload during tests.

And now on to the browser

So, with this model in mind I built some quads onto which I would draw circles, then set about seeing how fast they would render.

This is where my intuition malfunctioned because of the java/script environment, and I went through what I presume is a common learning process regarding how to get fast frames.

I found Chrome and Opera's on-screen FPS meter, and turned them on.

I tried a never-ending loop, resulting in a never-responsive browser.

I tried setTimeout(), which worked, but didn't give great performance.

I tried requestAnimationFrame(), which also didn't meet my expectations.

Finally, I found postMessage(), but while my own timing was showing around 1000 FPS, the browsers FPS meter still showed never more than 60 FPS.

And then I understood

At this point I realized the browsers--all of them--are keeping control of the buffer swap, and this is what they are reporting on their FPS meter.  After all, there is no reason to think that my code is the only thing in the browser using OpenGL, so the browser makes sure they are all in sync by keeping strict control over the buffer swap.  What's more, it turns out that if the browser can't consistently render right around 60 FPS, some will deliberately drop to 30 because studies show that a constant rate is more important than a high rate in terms of user experience.

But, what about my 1000 fps?

I wondered: if the browser is taking such control, what is happening with all my additional frames? Is it dropping them, or are they executing fully?

To test this I did two things:
  1. I watched the load of my GPU when rendering 1000 FPS via postMessage, vs. when rendering 60 FPS using requestAnimationFrame.  The result: the 1000 FPS caused much more load.
  2. I turned off double-buffering and rendered squares, animating them diagonally at one pixel per frame and confirmed that (a) they moved 1000 pixels per second, and (2) they all rendered (I could snapshot the screen and see each box just one pixel from the other).
Result: all 1000 frames are being rendered; only the buffer swap is postponed.

This is good news as we can use this "unlimited FPS" approach to see how quickly we can render frames, and in doing so deduce any additional overhead caused by the browser.

A few words about the physics

My goal in this exploration is to test the hypothesis that it is possible to approach native GL performance in the browser; I had no intention of working with general Javascript optimization except insofar as it was critical to my goal.

Well, it turns out it's important.  For example, my simple integrator used real time delta's (via performance.now()) for all calculations.  But, integrators have the characteristic that the larger the simulated forces are, the higher must be the frequency of integration, and if the integration frequency is even near the lower limit for the simulation, erratic changes in the time delta between steps will cause strange behavior--like pulsing motions.

So it was with my simple demo: the garbage collection cycles changed the integration step just enough to de-stabilize the simulation, giving it a heart-beat appearance, and the browser was so sensitive to other things done on my computer--such as touching another window or scrolling an editor--that those too would cause a "pulse" effect.

So, for now I have set the integrator to use 16 millisecond time increments, no matter the actual time between them; this stabilized it, but forces me to re-think the importance of tackling javascript--and in particular, web workers--during this exploation.

Conclusion, and what's next

So perhaps now it's clear why I said this was a "lumpy" start: I got a few lumps along the way, and performance in the browser tends to be lumpy.

The first demo is here (although I cannot promise it will stay there, I promise this page will continue to have a valid link):
http://experiments.uhdcoder.com/circles1/

Next up, I'll clean the code, prepare some performance numbers (different browsers, with and without physics), probably add scaling and then lay out why this implementation is slow and how we will improve it.