Question

所以我试图使用Windows线程做一些多线程。我面临的问题是完全相同的函数调用在串行情况下（7倍！）比单线程情况快得多。当我调查这是否是由于缓存利用率问题时，我在任务管理器中看到了一些快速数字。在串行情况下，使用的非分页内核内存为2793 MB。在单线程的情况下，这个数字上升到~3100MB。我怀疑在单线程情况下，生成的线程在另一个处理器上，所有额外的时间花在将数据移动到线程的内存中。我的问题是2折：

我不完全知道任务管理器上显示的非分页内核内存使用情况意味着什么，但鉴于所描述的行为，它是否支持我的怀疑？有没有更好的方法来测量Windows上的缓存未命中（希望Valgrind在Windows上工作：/）
如果没有，还有什么可能导致此问题？该函数的代码路径完全相同。该函数的结果完全相同，并且内部没有关键的section / mutex或堆内存分配。这让我很难过！

我无法显示任何实质性的源代码，但代码路径是这样的：

FunctionBeingProfiled() {
    point* p1 = globalPointArray[0];
    point* p2 = globalPointArray[1];
    point* p3 = globalPointArray[2];

    if (PointsAreProtected(p1, p2, p3)) {
          return false;
    }

    linesegment lseg(p1, p2);

    // The following 2 calls read from global data structures populated before this call is first encountered

    tetrahedron tet = FindFirstTet(&lseg); // Reads from a hash table

    EdgeIntersectionTest(&lseg); // Basically takes the given lseg and finds all the intersections from a global vector of line segments

    DoVolumeTest(&lseg, &tet); // uses only the passed in lseg and tet

    DoShapeTest(&lseg, &tet); // uses only the passed in lseg and tet
}

FuncA() {

if (serial) FunctionBeingProfiled();

if (parallel) {
    _beginthreadex(..); // This calls FunctionBeingProfiled
}

}

对于完全相同的函数，单线程代码比串行代码慢得多

0 个答案: