摘要

Question

摘要

给出一组边（从3到1,000,000的任何位置），有效地组装闭环（为方便起见，我将它们称为圆）。我正在将其作为一个较大的Python项目的一部分来运行，因此我希望最好的解决方案将是使用python绑定（这是我尝试过的）用C ++ / CUDA编写的。

问题

装配环，给定一组满足以下条件的边（由两个顶点索引定义）：

没有杂散边缘（所有边缘均用于创建圆），
所有环都闭合（例如，如果有一条1-2边，那么会有一条n-1边将其闭合），
边缘具有无序索引（例如，边缘可以是1-2或2-1）。

我的一般做法

在列表的末尾选择边缘，然后将一个顶点设置为新环的起点（pStart），将另一个顶点设置为链中的下一个点（pEnd），将两者同时添加进入新的铃声列表，
在边缘列表中搜索pEnd，
找到边缘后，将pEnd更新为不等于pEnd的顶点，并将其添加到环列表中，
重复上述两个步骤，直到pStart==pEnd，
如果没有更多的边缘停止，则在上方重复

我的实现

在C ++中，我实现了串行和并行版本。我使用一组45,000条边进行了测试，并获得以下结果：

序列（105秒）
平行-CUDA推力（28秒）

序列号：

#include <algorithm>
#include <vector>
#include <string>
#include <boost/algorithm/string.hpp>
#include <fstream>
#include <iostream>
#include <chrono>

std::vector< std::vector<int> > rings_from_edges(std::vector<std::vector<int>> edges)
{
    int pStart, pEnd;

    std::vector<int> temp;
    std::vector<std::vector<int>> rings;

    temp = edges.back();
    edges.pop_back();

    pStart = temp[0];
    pEnd = temp[1];

    int p1, p2;

    while(not edges.empty())
        // Scan edges list until pEnd is found.
        for(auto const& pts: edges)
        {
            p1 = pts[0];
            p2 = pts[1];

            // Check if the start of the edge corresponds with the end of the ring.
            if(p1 == pEnd)
            {
                temp.push_back(p2);
                pEnd = p2;
                edges.erase(std::remove(edges.begin(), edges.end(), pts), edges.end());

                // Check if the beginning of the ring is the same as the end of the newly appended edge.
                if (p2 == pStart)
                {
                    // Add the newly created ring to the rings list.
                    rings.push_back(temp);
                    temp.clear();

                    // If the edges list contains more edges, reset the starting and end points to search for a new ring.
                    if(not edges.empty())
                    {
                        temp = edges.back();
                        edges.pop_back();

                        pStart = temp[0];
                        pEnd = temp[1];
                    }
                }
                break;
            }
                // Check if the end of the edge corresponds with the end of the ring.
            else if(p2 == pEnd)
            {
                temp.push_back(p1);
                pEnd = p1;
                edges.erase(std::remove(edges.begin(), edges.end(), pts), edges.end());

                // Check if the beginning of the ring is the same as the end of the newly appended edge.
                if (p1 == pStart)
                {
                    // Add the newly created ring to the rings list.
                    rings.push_back(temp);
                    temp.clear();

                    // If the edges list contains more edges, reset the starting and end points to search for a new ring.
                    if(not edges.empty())
                    {
                        temp = edges.back();
                        edges.pop_back();

                        pStart = temp[0];
                        pEnd = temp[1];
                    }
                }
                break;
            }
        }
    return rings;
}

int main() {

    std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();

    std::vector< std::vector<int> > vectIN, vectOUT;


    std::string fileName = "PATH TO CSV FILE";

    std::string delimeter = ",";

    std::ifstream file(fileName);

    std::string line = "";

    while (getline(file, line))
    {
        std::vector<std::string> vec;
        boost::algorithm::split(vec, line, boost::is_any_of(delimeter));
        std::vector<int> vec2;
        vec2.emplace_back(std::stoi(vec.data()[0]));
        vec2.emplace_back(std::stoi(vec.data()[1]));

        vectIN.push_back(vec2);
    }

    file.close();

    std::chrono::steady_clock::time_point t2 = std::chrono::steady_clock::now();

    vectOUT = rings_from_edges(vectIN);

    std::chrono::steady_clock::time_point t3 = std::chrono::steady_clock::now();

    for (auto const& ring:vectOUT)
    {
        for(auto const& pt:ring)
        {
            if(pt>=0)
                std::cout << pt << " ";
        }
        std::cout << std::endl;
    }

    std::chrono::steady_clock::time_point t4 = std::chrono::steady_clock::now();

    long t1_t2 = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    long t2_t3 = std::chrono::duration_cast<std::chrono::milliseconds>(t3 - t2).count();
    long t3_t4 = std::chrono::duration_cast<std::chrono::milliseconds>(t4 - t3).count();

    std::cout << "Load csv:      " << t1_t2 << std::endl;
    std::cout << "Ring assembly: " << t2_t3 << std::endl;
    std::cout << "Output:        " << t3_t4 << std::endl;

    std::cout << "----------------- THAT'S ALL FOLKS!!! -----------------" << std::endl;

    return 0;
}

以上的CUDA Thrust版本：

#include <thrust/remove.h>
#include <thrust/device_vector.h>
#include <thrust/iterator/zip_iterator.h>
#include <thrust/tuple.h>
#include <thrust/copy.h>
#include <thrust/sort.h>
#include <thrust/find.h>
#include <thrust/functional.h>

#include <iostream>
#include <string>
#include <vector>
#include <fstream>

#include <iterator>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <chrono>
#include <thrust/binary_search.h>

#include <thrust/uninitialized_copy.h>
#include <thrust/device_malloc.h>
#include <thrust/device_vector.h>


int main(){
    std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();

    std::string fileName = "PATH TO CSV HERE";

    std::string delimeter = ",";

    std::ifstream file(fileName);

    std::vector<std::vector<int>> vectIN;

    std::string line = "";

    while (getline(file, line))
    {
        std::vector<std::string> vec;
        boost::algorithm::split(vec, line, boost::is_any_of(delimeter));
        std::vector<int> vec2;
        vec2.emplace_back(std::stoi(vec.data()[0]));
        vec2.emplace_back(std::stoi(vec.data()[1]));

        vectIN.push_back(vec2);
    }

    file.close();
    std::chrono::steady_clock::time_point t2 = std::chrono::steady_clock::now();


    std::vector<int> h_edge1, h_edge2;

    h_edge1.reserve(vectIN.size());
    h_edge2.reserve(vectIN.size());

    for(auto const& pts: vectIN)
    {
        h_edge1.emplace_back(pts[0]);
        h_edge2.emplace_back(pts[1]);
    }
    std::chrono::steady_clock::time_point t3 = std::chrono::steady_clock::now();

    thrust::device_vector<int> d_pStart(1);
    thrust::device_vector<int> d_pEnd(1);

    thrust::host_vector<int> h_rings;
    thrust::device_vector<int> d_rings;

    // Initialize edge vectors / pStart / pEnd /  while minimizing copying with CudaMalloc

    thrust::device_vector<int> d_edge1(vectIN.size());
    thrust::device_vector<int> d_edge2(vectIN.size());

    thrust::copy(thrust::make_zip_iterator(thrust::make_tuple(h_edge1.begin(), h_edge2.begin())),
                 thrust::make_zip_iterator(thrust::make_tuple(h_edge1.end(),   h_edge2.end())),
                 thrust::make_zip_iterator(thrust::make_tuple(d_edge1.begin(), d_edge2.begin())));

    // Arrange edges with edge1 as key and edge2 as value
    thrust::sort_by_key(d_edge1.begin(), d_edge1.end(), d_edge2.begin());

    d_rings.push_back(d_edge1.back());
    d_rings.push_back(d_edge2.back());

    d_edge1.pop_back();
    d_edge2.pop_back();

    d_pStart[0] = d_rings[0];
    d_pEnd[0] = d_rings[1];

    thrust::device_vector<int> element(1), p1(1), p2(1);

    while(not d_edge1.empty())
    {
        element.clear();

        int temp = d_pEnd[0];

        auto iter1 = thrust::equal_range(thrust::device, d_edge1.begin(), d_edge1.end(), temp);


        if(iter1.first != iter1.second)
        {
            element[0] = thrust::distance(d_edge1.begin(), iter1.first);
        }
        else
        {
            auto iter2 = thrust::find(thrust::device, d_edge2.begin(), d_edge2.end(), d_pEnd[0]);
            element[0] = thrust::distance(d_edge2.begin(), iter2);
        }

        // EDGE START INDEX (P1) AND END INDEX (P2)
        p1[0] = d_edge1[element[0]];
        p2[0] = d_edge2[element[0]];

        // ERASE THE EDGE FROM DEVICE LIST
        d_edge1.erase(d_edge1.begin()+element[0]);
        d_edge2.erase(d_edge2.begin()+element[0]);

        if(p1[0] == d_pEnd[0])
        {
            d_pEnd[0] = p2[0];

            if( d_pStart[0] == d_pEnd[0])
            {
                d_rings.push_back(-p2[0]);

                if(not d_edge1.empty())
                {
                    d_pStart[0] = d_edge1.back();
                    d_pEnd[0]   = d_edge2.back();

                    d_rings.push_back(d_pStart[0]);
                    d_rings.push_back(d_pEnd[0]);

                    d_edge1.pop_back();
                    d_edge2.pop_back();

                }
            }
            else
            {
                d_rings.push_back(p2[0]);
            }
        }
        else if(p2[0] == d_pEnd[0])
        {
            d_pEnd[0] = p1[0];

            if(d_pStart[0] == d_pEnd[0])
            {
                d_rings.push_back(-p1[0]);

                if(not d_edge1.empty())
                {
                    d_pStart[0] = d_edge1.back();
                    d_pEnd[0]   = d_edge2.back();

                    d_rings.push_back(d_pStart[0]);
                    d_rings.push_back(d_pEnd[0]);

                    d_edge1.pop_back();
                    d_edge2.pop_back();
                }
            }
            else
            {
                d_rings.push_back(p1[0]);
            }
        }
    }
    std::chrono::steady_clock::time_point t4 = std::chrono::steady_clock::now();

    // Copy rings to host and print them.
    h_rings = d_rings;

    for(auto const& pt:h_rings)
    {
        if(pt>=0)
            std::cout << pt << " ";
        else
            std::cout << -pt << std::endl;
    }
    std::cout << std::endl;

    long t1_t2 = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
    long t2_t3 = std::chrono::duration_cast<std::chrono::milliseconds>(t3 - t2).count();
    long t3_t4 = std::chrono::duration_cast<std::chrono::milliseconds>(t4 - t3).count();

    std::cout << "Load csv:      " << t1_t2 << std::endl;
    std::cout << "Create vector: " << t2_t3 << std::endl;
    std::cout << "Ring assembly: " << t3_t4 << std::endl;

    std::cout << "----------------- THAT'S ALL FOLKS!!! -----------------" << std::endl;

    return 0;
}

其他

我已经实现了类似于上述CUDA代码的内容，但是将数据组织到了存储桶中，因此仅需对有限数量的数据进行搜索。不幸的是，我还没有完全发挥作用。

最近，我一直在研究图形库，以查看是否可以这样做，但是我也没有成功地使这种方式起作用。我知道CUDA工具包既有功能又有增强功能。

最后的话

我希望至少在10秒内运行一次，但理想情况下，我希望在一秒钟内运行一百万个边缘。我不知道这是否现实，但我希望通过Cuda加速它可以实现这一目标，或者一起寻找其他算法。我正在伸出手来看看是否有人可以帮助我实现这一目标。

Answer 1

我押注Hierholzer's algorithm的串行C ++实现，以发现Euler循环在10 ^ 6的边沿上运行不到一秒钟，这是没有问题的，因为渐近运行时间为O（| E |）。在完成了导览后，我们仍然需要将其分解为简单的循环，我们可以使用这样的Python代码来完成（警告：未经测试）。

ax = df.plot('Datetime', ['Price', 'Volume'], secondary_y='Price')
ax.set_xticks(positions)
ax.set_xticklabels(labels)

这里是我所想到的完整的C ++代码。可以编译，但是完全未经测试。绝对没有任何保证。

def simple_cycles(tour_vertices):
    stack = []
    index = {}
    for v in tour_vertices:
        stack.append(v)
        i = index.get(v)
        if i is None:
            index[v] = len(stack) - 1
            continue
        yield stack[i:]
        for w in stack[i+1:]:
            del index[w]
        del stack[i+1:]

有效地找到一组边缘中的现有圆

摘要

问题

我的一般做法

我的实现

序列号：

以上的CUDA Thrust版本：

其他

最后的话

1 个答案: