我尝试在同时渲染时将纹理数据流式传输到GPU。但是,我所做的所有尝试似乎都会触发某种驱动器失速(目前使用的是带有GeForce 980 Ti的nVidia&#372.54驱动程序)。此外,当失速发生时,其他线程上的其他GL调用也会停止,直到glFenceSync
调用返回。
我试图使用PBO将传输工作卸载到GPU上,但这似乎没什么帮助。
现在我的代码做了这样的事情:
uint32_t mipSize = (uint32_t)mip->getSize();
uint32_t blockTransferSize;
uint32_t lineCount = 1;
uint32_t transferRemaining = mipSize;
{
auto pboSize = _textureTransferHelper->getTransferBlockSize();
uint32_t lineSize = mipSize / size.y;
// Make sure we can transfer at least one line at a time
assert(pboSize >= lineSize);
assert(0 == (mipSize % size.y));
while (((lineCount + 1) * lineSize) < pboSize) {
++lineCount;
}
blockTransferSize = lineCount * lineSize;
}
for (uint32_t l = 0; l < size.y; l += lineCount) {
auto block = helper->getAvailableTransferBlock();
block._miplevel = mipLevel;
block._format = texelFormat.format;
block._type = texelFormat.type;
block._offset = ivec3(0, l, 0);
block._size = uvec3(size.x, std::min(lineCount, size.y - l), 0);
uint32_t thisTransferSize = std::min(blockTransferSize, transferRemaining);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, block._pbo);
glBufferData(GL_PIXEL_UNPACK_BUFFER, thisTransferSize, 0, GL_STREAM_DRAW);
auto ptr = glMapBufferRange(GL_PIXEL_UNPACK_BUFFER, 0, thisTransferSize, GL_MAP_WRITE_BIT);
memcpy(ptr, srcPtr, thisTransferSize);
glUnmapBuffer(GL_PIXEL_UNPACK_BUFFER);
glTextureSubImage2D(_id, block._miplevel, block._offset.x, block._offset.y, block._size.x, block._size.y, block._format, block._type, 0);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
GLsync fence = glFenceSync(GL_SYNC_GPU_COMMANDS_COMPLETE, 0);
...
}
与通常的几十个 micro 秒相比,glFenceSync()
调用返回需要3毫秒。
我试图创建一个可以重现问题的最小独立示例,但是从我当前的GPU抽象层中提取代码以使其更具可读性是很棘手的。
有谁能告诉我是什么触发了这种跨上下文的失误?