Question

WebGL周围有许多用于运行并行处理的抽象，例如：

但我很难理解一个简单而完整的并行性示例在WebGL的普通GLSL代码中会是什么样子。我对WebGL没有多少经验，但我知道有fragment and vertex shaders以及如何从JavaScript加载到WebGL上下文中。我不知道如何使用着色器或者应该使用哪一个并行处理。

我想知道是否可以演示一个简单的hello world示例并行添加操作，实际上这是使用GLSL / WebGL着色器的并行形式/但是应该这样做。

var array = []
var size = 10000
while(size--) array.push(0)

for (var i = 0, n = 10000; i < n; i++) {
  array[i] += 10
}

我想我基本上不理解：

如果WebGL自动并行运行所有内容。
或者如果并行运行最多数量的东西，那么如果你有10,000件东西，但只有1000件并行运行，那么它将依次并行10次。
或者，如果您必须手动指定所需的并行数量。
如果并行性进入片段着色器或顶点着色器，或两者都有。
如何实际实现并行示例。

Answer 1

首先，WebGL only rasterizes points, lines, and triangles。使用WebGL进行非光栅化（GPGPU）基本上是要意识到WebGL的输入是来自数组和输出的数据，像素的2D矩形实际上也只是一个2D数组，所以通过创造性地提供非图形数据并创造性地光栅化这些数据，你可以做非图形数学。

WebGL以两种方式并行。

它在不同的处理器GPU上运行，而它正在计算你的CPU可以自由做其他事情的事情。
GPU本身并行计算。一个很好的例子，如果你用100像素光栅化三角形，GPU可以并行处理每个像素，直到达到GPU的极限。如果没有深入挖掘，看起来像NVidia 1080 GPU有2560个核心，所以假设它们不是专业的并且假设最好的情况，其中一个可以并行计算2560个。

例如，所有WebGL应用程序都使用上述第（1）和（2）点的并行处理，而没有做任何特殊的事情。

添加10到10000个元素并不是WebGL所擅长的，因为WebGL在一次操作中无法读取和写入相同的数据。换句话说，您的示例需要

const size = 10000;
const srcArray = [];
const dstArray = [];
for (let i = 0; i < size; ++i) {
 srcArray[i] = 0;
}

for (var i = 0, i < size; ++i) {
  dstArray[i] = srcArray[i] + 10;
}

就像任何编程语言一样，实现这一目标的方法不止一种。最快的可能是将所有值复制到纹理中，然后栅格化为另一个纹理，从第一个纹理向上查找并将+10写入目标。但是，其中一个问题就在于此。向GPU传输数据和从GPU传输数据的速度很慢，因此您需要权衡在GPU上工作是否是一种胜利。

另一个就像您无法读取和写入同一阵列的限制一样，您也无法随机访问目标阵列。 GPU正在栅格化线，点或三角形。它在绘制三角形时速度最快，但这意味着它决定以什么顺序写入哪些像素，这样你的问题也必须符合这些限制。您可以使用点作为随机选择目标的方法，但渲染点比渲染三角形要慢得多。

请注意＆＃34; Compute Shaders＆＃34; （尚未成为WebGL的一部分）向GPU添加随机访问写入功能。

示例：

＆＃13;

const gl = document.createElement("canvas").getContext("webgl");

const vs = `
attribute vec4 position;
attribute vec2 texcoord;

varying vec2 v_texcoord;

void main() {
  gl_Position = position;
  v_texcoord = texcoord;
}
`;

const fs = `
precision highp float;
uniform sampler2D u_srcData;
uniform float u_add;

varying vec2 v_texcoord;

void main() {
  vec4 value = texture2D(u_srcData, v_texcoord);
  
  // We can't choose the destination here. 
  // It has already been decided by however
  // we asked WebGL to rasterize.
  gl_FragColor = value + u_add;
}
`;

// calls gl.createShader, gl.shaderSource,
// gl.compileShader, gl.createProgram, 
// gl.attachShaders, gl.linkProgram,
// gl.getAttributeLocation, gl.getUniformLocation
const programInfo = twgl.createProgramInfo(gl, [vs, fs]);


const size = 10000;
// Uint8Array values default to 0
const srcData = new Uint8Array(size);
// let's use slight more interesting numbers
for (let i = 0; i < size; ++i) {
  srcData[i] = i % 200;
}

// Put that data in a texture. NOTE: Textures
// are (generally) 2 dimensional and have a limit
// on their dimensions. That means you can't make
// a 1000000 by 1 texture. Most GPUs limit from
// between 2048 to 16384.
// In our case we're doing 10000 so we could use
// a 100x100 texture. Except that WebGL can
// process 4 values at a time (red, green, blue, alpha)
// so a 50x50 will give us 10000 values
const srcTex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, srcTex);
const level = 0;
const width = Math.sqrt(size / 4);
if (width % 1 !== 0) {
  // we need some other technique to fit
  // our data into a texture.
  alert('size does not have integer square root');
}
const height = width;
const border = 0;
const internalFormat = gl.RGBA;
const format = gl.RGBA;
const type = gl.UNSIGNED_BYTE;
gl.texImage2D(
  gl.TEXTURE_2D, level, internalFormat,
  width, height, border, format, type, srcData);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
  
// create a destination texture
const dstTex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, dstTex);
gl.texImage2D(
  gl.TEXTURE_2D, level, internalFormat,
  width, height, border, format, type, null);

gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);

// make a framebuffer so we can render to the
// destination texture
const fb = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
// and attach the destination texture
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, dstTex, level);

// calls gl.createBuffer, gl.bindBuffer, gl.bufferData
// to put a 2 unit quad (2 triangles) into
// a buffer with matching texture coords
// to process the entire quad
const bufferInfo = twgl.createBufferInfoFromArrays(gl, {
  position: {
    data: [
      -1, -1,
       1, -1,
      -1,  1,
      -1,  1,
       1, -1,
       1,  1,
    ],
    numComponents: 2,
  },
  texcoord: [
     0, 0,
     1, 0,
     0, 1,
     0, 1,
     1, 0, 
     1, 1,
  ],
});

gl.useProgram(programInfo.program);

// calls gl.bindBuffer, gl.enableVertexAttribArray, gl.vertexAttribPointer
twgl.setBuffersAndAttributes(gl, programInfo, bufferInfo);

// calls gl.activeTexture, gl.bindTexture, gl.uniformXXX
twgl.setUniforms(programInfo, {
  u_add: 10 / 255,  // because we're using Uint8
  u_srcData: srcTex,
});

// set the viewport to match the destination size
gl.viewport(0, 0, width, height);

// draw the quad (2 triangles)
const offset = 0;
const numVertices = 6;
gl.drawArrays(gl.TRIANGLES, offset, numVertices);

// pull out the result
const dstData = new Uint8Array(size);
gl.readPixels(0, 0, width, height, format, type, dstData);

console.log(dstData);

＆＃13;

<script src="https://twgljs.org/dist/4.x/twgl-full.min.js"></script>

＆＃13;

制作通用数学处理器需要更多的工作。

的问题：

纹理是2D数组，WebGL仅栅格化点，线和三角形，因此例如处理适合矩形的数据要容易得多。换句话说，如果您有10001个值，则没有适合整数个单位的矩形。最好填充数据，然后忽略过去的部分。换句话说，100x101纹理将是10100个值。所以只需忽略最后99个值。

上面的示例使用8位4通道纹理。使用8位1通道纹理（更少数学）会更容易，但效率也会降低，因为WebGL每次操作可以处理4个值。

因为它使用8位纹理，所以它只能存储0到255之间的整数值。我们可以将纹理切换为32位浮点纹理。浮点纹理是WebGL的可选功能（您需要启用扩展并检查它们是否成功）。栅格化到浮点纹理也是一项可选功能。截至2018年的大多数移动GPU不支持渲染到浮点纹理，因此如果您希望代码在这些GPU上运行，您必须找到将结果编码为支持格式的创造性方法。

寻址源数据需要数学从1d索引转换为2d纹理坐标。在上面的示例中，因为我们直接从srcData转换为dstData 1到1，所以不需要数学运算。如果你需要跳转到srcData，你需要提供那个数学

WebGL1

vec2 texcoordFromIndex(int ndx) {
  int column = int(mod(float(ndx),float(widthOfTexture)));
  int row = ndx / widthOfTexture;
  return (vec2(column, row) + 0.5) / vec2(widthOfTexture, heighOfTexture);
}

vec2 texcoord = texcoordFromIndex(someIndex);
vec4 value = texture2D(someTexture, texcoord);

WebGL2

ivec2 texcoordFromIndex(someIndex) {
  int column = ndx % widthOfTexture;
  int row = ndx / widthOfTexture;
  return ivec2(column, row);
}

int level = 0;
ivec2 texcoord = texcoordFromIndex(someIndex);
vec4 value = texelFetch(someTexture, texcoord, level);

让我们说我们想要每2个数字相加。我们可能会做这样的事情

＆＃13;

const gl = document.createElement("canvas").getContext("webgl2");

const vs = `
#version 300 es
in vec4 position;

void main() {
  gl_Position = position;
}
`;

const fs = `
#version 300 es
precision highp float;
uniform sampler2D u_srcData;

uniform ivec2 u_destSize;  // x = width, y = height

out vec4 outColor;

ivec2 texcoordFromIndex(int ndx, ivec2 size) {
  int column = ndx % size.x;
  int row = ndx / size.x;
  return ivec2(column, row);
}

void main() {
  // compute index of destination
  ivec2 dstPixel = ivec2(gl_FragCoord.xy);
  int dstNdx = dstPixel.y * u_destSize.x + dstPixel.x; 

  ivec2 srcSize = textureSize(u_srcData, 0);

  int srcNdx = dstNdx * 2;
  ivec2 uv1 = texcoordFromIndex(srcNdx, srcSize);
  ivec2 uv2 = texcoordFromIndex(srcNdx + 1, srcSize);

  float value1 = texelFetch(u_srcData, uv1, 0).r;
  float value2 = texelFetch(u_srcData, uv2, 0).r;
  
  outColor = vec4(value1 + value2);
}
`;

// calls gl.createShader, gl.shaderSource,
// gl.compileShader, gl.createProgram, 
// gl.attachShaders, gl.linkProgram,
// gl.getAttributeLocation, gl.getUniformLocation
const programInfo = twgl.createProgramInfo(gl, [vs, fs]);


const size = 10000;
// Uint8Array values default to 0
const srcData = new Uint8Array(size);
// let's use slight more interesting numbers
for (let i = 0; i < size; ++i) {
  srcData[i] = i % 99;
}

const srcTex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, srcTex);
const level = 0;
const srcWidth = Math.sqrt(size / 4);
if (srcWidth % 1 !== 0) {
  // we need some other technique to fit
  // our data into a texture.
  alert('size does not have integer square root');
}
const srcHeight = srcWidth;
const border = 0;
const internalFormat = gl.R8;
const format = gl.RED;
const type = gl.UNSIGNED_BYTE;
gl.texImage2D(
  gl.TEXTURE_2D, level, internalFormat,
  srcWidth, srcHeight, border, format, type, srcData);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
  
// create a destination texture
const dstTex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, dstTex);
const dstWidth = srcWidth;
const dstHeight = srcHeight / 2;
// should check srcHeight is evenly
// divisible by 2
gl.texImage2D(
  gl.TEXTURE_2D, level, internalFormat,
  dstWidth, dstHeight, border, format, type, null);

gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);

// make a framebuffer so we can render to the
// destination texture
const fb = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, fb);
// and attach the destination texture
gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, dstTex, level);

// calls gl.createBuffer, gl.bindBuffer, gl.bufferData
// to put a 2 unit quad (2 triangles) into
// a buffer
const bufferInfo = twgl.createBufferInfoFromArrays(gl, {
  position: {
    data: [
      -1, -1,
       1, -1,
      -1,  1,
      -1,  1,
       1, -1,
       1,  1,
    ],
    numComponents: 2,
  },
});

gl.useProgram(programInfo.program);

// calls gl.bindBuffer, gl.enableVertexAttribArray, gl.vertexAttribPointer
twgl.setBuffersAndAttributes(gl, programInfo, bufferInfo);

// calls gl.activeTexture, gl.bindTexture, gl.uniformXXX
twgl.setUniforms(programInfo, {
  u_srcData: srcTex,
  u_srcSize: [srcWidth, srcHeight],
  u_dstSize: [dstWidth, dstHeight],
});

// set the viewport to match the destination size
gl.viewport(0, 0, dstWidth, dstHeight);

// draw the quad (2 triangles)
const offset = 0;
const numVertices = 6;
gl.drawArrays(gl.TRIANGLES, offset, numVertices);

// pull out the result
const dstData = new Uint8Array(size / 2);
gl.readPixels(0, 0, dstWidth, dstHeight, format, type, dstData);

console.log(dstData);

＆＃13;

<script src="https://twgljs.org/dist/4.x/twgl-full.min.js"></script>

＆＃13;

请注意，上面的示例使用WebGL2。为什么？因为WebGL2支持渲染到R8格式纹理，这使得数学变得容易。每个像素一个值，而不是像前一个示例那样每像素4个值。当然，它也意味着速度较慢但是使用4个值会使计算索引的数学运算变得复杂，或者可能需要重新安排源数据以更好地匹配。例如，如果值0, 1, 2, 3, 4, 5, 6, 7, 8, ...而不是价值指数，那么如果它们排列为0, 2, 4, 6, 1, 3, 5, 7, 8 ....，那么就更容易对每个值进行求和，这样一次拉出4个并且添加下一个4个值的值将排列。另一种方法是使用2个源纹理，将所有偶数索引值放在一个纹理中，将奇数索引值放在另一个纹理中。

WebGL1提供了LUMINANCE和ALPHA纹理，它们也是一个通道，但无论您是否可以渲染它们都是一个可选功能，因为在WebGL2中渲染到R8纹理是必需的功能。

WebGL2还提供了一些名为＆＃34;转换反馈＆＃34;。这使您可以将顶点着色器的输出写入缓冲区。它的优点是您只需设置要处理的顶点数（无需将目标数据设为矩形）。这也意味着你可以输出浮点值（它不像渲染到纹理那样是可选的）。我相信（尽管我还没有测试过）它比渲染到纹理要慢。

由于您是WebGL的新手，我建议these tutorials。

你好WebGL并行的世界例子

1 个答案: