Question

我正在修改考试（仍然）并且遇到了一个让我难过的问题（下面发布）。总而言之，我认为问题是“想想any_old_process必须遍历图形并对它找到的对象做一些工作，包括添加更多工作。”我的问题是，什么数据结构可以并行化以实现问题中提出的目标？

垃圾收集器（GC）的作用是回收未使用的内存。   跟踪收集器必须通过遍历图形来标识所有活动对象   由聚合关系引起的对象。简而言之，GC有   一些要执行的工作列表。它反复（a）获得任务   （例如，要检查的对象），（b）执行任务（例如标记   对象，除非它已被标记），以及（c）生成进一步的任务   （例如，将未标记任务的子项添加到工作列表中）。它是   希望并行化此操作。

在单线程中   在环境中，工作列表通常是单个LIFO堆栈。什么会   你需要做的是为并行GC安全吗？这会是一个   并行GC的合理设计？讨论数据结构的设计   支持可以更好地扩展的并行GC。解释你的原因   期望它们能够更好地扩展。

Answer 1

图形的自然数据结构是图形，即可以引用其他元素的一组图形元素（节点）。但是，为了更好地缓存重用，可以将元素放置/分配在一个或多个数组（通常是向量）中，以便将相邻元素尽可能地放在内存中。通常，每个元素或一组元素都应该有一个互斥锁（spin_mutex）来保护对它的访问，争用意味着其他一些线程忙于处理它，所以不需要等待。但是，如果可能的话，最好在标志/状态字段上进行原子操作，以便在没有锁定的情况下将元素标记为已访问。例如，最简单的数据结构可以是：

struct object {
    vector<object*> references;
    atomic<bool> is_visited; // for simplicity, or epoch counter
                             // if nothing resets it to false
    void inspect();          // processing method
};
vector<object> objects;      // also for simplicity, if it can be for real
                             // things like `parallel_for` would be perfect here

鉴于此数据结构和GC工作方式的描述，它完全适合像divide-and-conquer pattern这样的递归并行：

void object::inspect() {
    if( ! is_visited.exchange(true) ) {
        for( object* o : objects )   // alternatively it can be `parallel_for` in some variants
            cilk_spawn o->inspect(); // for Cilk or `task_group::run` for TBB or PPL
        // further processing of the object
    }
}

如果问题中的数据结构是任务的组织方式。我推荐一个工作窃取调度程序（如tbb或cilk。有很多关于这个主题的论文。简单来说，每个工作线程都有自己但共享的任务副本，并且当双端队列为空时，一个线程会从别人的请求中窃取任务。

可伸缩性来自于每个任务可以添加一些其他任务的属性，这些任务可以在prarallel中运行。

Answer 2

您的问题：

想想any_old_process必须遍历图表并对它找到的对象做一些工作，包括添加更多工作。
...可以并行化哪些数据结构来实现问题中提出的目标？

引用的问题：

关于垃圾收集的一些事情。

由于您对并行化图算法特别感兴趣，我将举例说明一种可以很好地并行化的图遍历。

执行摘要

查找局部最小值（“盆”）或最大值（“峰值”）是数字图像处理中的有用操作。一个具体的例子是地质流域分析。该问题的一种方法将图像中的每个像素或小像素组视为节点，并找到具有局部最小值作为树根的非重叠最小生成树（MST）。

血腥细节

以下是一个简单的例子。这是由web interview question from Palantir Technologies带到Programming Puzzles & Code Golf的AnkitSablok。它通过两个假设（粗体）进行了简化：

像素/单元只有4个邻居，而不是通常的8个。
一个牢房有所有上坡邻居（这是当地的最低点）或者有一个独特的下坡邻居。即，不允许平原。

下面是一些可以解决这个问题的JavaScript。它违反了使用副作用的所有合理编码标准，但说明了并行化的一些机会存在的位置。

在“创建接收器（即根）的循环列表”循环中，请注意，只要高程数据是静态的，就可以完全独立地评估每个单元的相对于其邻居的高程。在顺序程序中，一个执行线程检查每个单元格。在并行程序中，单元格被分割，以便一个且只有一个线程读取和写入本地最小值状态信息（sink[]在下面的程序中）。如果并行生成最小值/根列表，则必须同步堆栈的排队操作。有关如何对堆栈和其他队列执行此操作的讨论，请参阅"Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms", Michael & Scott, 1996。对于现代更新，请按照Google学术搜索中的引文树（无需使用互斥锁:)。
在“每个根探索它的盆地”循环中，注意每个盆地可以平行探测/枚举/淹没。

如果您希望深入了解MST的并行化，请参阅"Scalable Parallel Minimum Spanning Forest Computation", Nobari, Cao, arras, Bressan, 2012。前两页包含对该领域的简明扼要的调查。

简化示例

一群农民有一些海拔数据，我们将帮助他们了解降雨如何流经他们的农田。我们将土地表示为二维高度阵列，并根据水流下坡的想法使用以下模型：

如果一个小区的四个相邻小区都有更高的高度，我们称这个小区为接收器;水汇集在水槽中。否则，水将流向具有最低海拔的相邻小区。 如果一个小区不是一个接收器，你可以假设它有一个唯一的最低邻居，并且该邻居将低于该小区。

直接或间接排入同一水槽的细胞据说是同一盆地的一部分。

您面临的挑战是将地图划分为盆地。特别是，给定一个高程地图，您的代码应该将地图划分为盆地并按降序输出盆地的大小。

假设高程图是方形的。输入将以一个整数S开始，S是地图的高度（和宽度）。接下来的S行将包含一行地图，每行包含S个整数 - 行中S单元格的高程。一些农民有小地块，如下面的例子，而一些农民有较大的地块。但是，在任何情况下，农民都不会有大于S = 5000的土地。

您的代码应按降序输出以空格分隔的盆地大小列表。（忽略尾随空格。）

以下是一个例子：

Input:
5
1 0 2 5 8
2 3 4 7 9
3 5 7 8 9
1 2 5 4 2
3 3 5 2 1

Output:  11 7 7

The basins, labeled with A’s, B’s, and C’s, are:
A A A A A
A A A A A
B B A C C
B B B C C
B B C C C

// lm.js - find the local minima


//  Globalization of variables.

/*
    The map is a 2 dimensional array. Indices for the elements map as:

    [0,0] ... [0,n]
    ...
    [n,0] ... [n,n]

Each element of the array is a structure. The structure for each element is:

Item    Purpose         Range       Comment
----    -------         -----       -------
h   Height of cell      integers
s   Is it a sink?       boolean
x   X of downhill cell  (0..maxIndex)   if s is true, x&y point to self
y   Y of downhill cell  (0..maxIndex)
b   Basin name      ('A'..'A'+# of basins)

Use a separate array-of-arrays for each structure item. The index range is
0..maxIndex.
*/
var height = [];
var sink = [];
var downhillX = [];
var downhillY = [];
var basin = [];
var maxIndex;

//  A list of sinks in the map. Each element is an array of [ x, y ], where
// both x & y are in the range 0..maxIndex.
var basinList = [];

//  An unordered list of basin sizes.
var basinSize = [];


//  Functions.

function isSink(x,y) {
    var myHeight = height[x][y];
    var imaSink = true;
    var bestDownhillHeight = myHeight;
    var bestDownhillX = x;
    var bestDownhillY = y;

    /*
        Visit the neighbors. If this cell is the lowest, then it's the
    sink. If not, find the steepest downhill direction.
    */
    function visit(deltaX,deltaY) {
        var neighborX = x+deltaX;
        var neighborY = y+deltaY;
        if (myHeight > height[neighborX][neighborY]) {
            imaSink = false;
            if (bestDownhillHeight > height[neighborX][neighborY]) {
                bestDownhillHeight = height[neighborX][neighborY];
                bestDownhillX = neighborX;
                bestDownhillY = neighborY;
            }
        }
    }
    if (x !== 0) {
        // upwards neighbor exists
        visit(-1,0);
    }
    if (x !== maxIndex) {
        // downwards neighbor exists
    visit(1,0);
    }
    if (y !== 0) {
        // left-hand neighbor exists
        visit(0,-1);
    }
    if (y !== maxIndex) {
        // right-hand neighbor exists
        visit(0,1);
    }

    downhillX[x][y] = bestDownhillX;
    downhillY[x][y] = bestDownhillY;
    return imaSink;
}

function exploreBasin(x,y,currentSize,basinName) {
    //  This cell is in the basin.
    basin[x][y] = basinName;
    currentSize++;

    /*
        Visit all neighbors that have this cell as the best downhill
    path and add them to the basin.
    */
    function visit(x,deltaX,y,deltaY) {
        if ((downhillX[x+deltaX][y+deltaY] === x) && (downhillY[x+deltaX][y+deltaY] === y)) {
            currentSize = exploreBasin(x+deltaX,y+deltaY,currentSize,basinName);
        }
        return 0;
    }
    if (x !== 0) {
        // upwards neighbor exists
        visit(x,-1,y,0);
    }
    if (x !== maxIndex) {
        // downwards neighbor exists
        visit(x,1,y,0);
    }
    if (y !== 0) {
        // left-hand neighbor exists
        visit(x,0,y,-1);
    }
    if (y !== maxIndex) {
        // right-hand neighbor exists
        visit(x,0,y,1);
    }

    return currentSize;
}

//  Read map from file (1st argument).
var lines = $EXEC('cat "' + $ARG[0] + '"').split('\n');
maxIndex = lines.shift() - 1;
for (var i = 0; i<=maxIndex; i++) {
    height[i] = lines.shift().split(' ');
    //  Create all other 2D arrays.
    sink[i] = [];
    downhillX[i] = [];
    downhillY[i] = [];
    basin[i] = [];
}
for (var i = 0; i<=maxIndex; i++) { print(height[i]); }

//  Everyone decides if they are a sink. Create list of sinks (i.e. roots).
for (var x=0; x<=maxIndex; x++) {
    for (var y=0; y<=maxIndex; y++) a
        if (sink[x][y] = isSink(x,y)) {
            //  This node is a root (AKA sink).
            basinList.push([x,y]);
        }
    }
}
//for (var i = 0; i<=maxIndex; i++) { print(sink[i]); }

//  Each root explores it's basin.
var basinName = 'A';
for (var i=basinList.length-1; i>=0; --i) { // i-- makes Closure Compiler sad
    var x = basinList[i][0];
    var y = basinList[i][5];
    basinSize.push(exploreBasin(x,y,0,basinName));
    basinName = String.fromCharCode(basinName.charCodeAt() + 1);
}
for (var i = 0; i<=maxIndex; i++) { print(basin[i]); }

//  Done.
print(basinSize.sort(function(a, b){return b-a}).join(' '));

并行遍历图表

2 个答案:

执行摘要

血腥细节

简化示例