Question

我想读一个这样的文件：

13.3027 29.2191 2.39999
13.3606 29.1612 2.39999
13.3586 29.0953 2.46377
13.4192 29.106 2.37817

它有超过 1mio 行。

我当前的cpp代码是：

loadCloud(const string &filename, PointCloud<PointXYZ> &cloud)
{
    print_info("\nLoad the Cloud .... (this takes some time!!!) \n");
    ifstream fs;
    fs.open(filename.c_str(), ios::binary);
    if (!fs.is_open() || fs.fail())
    {
        PCL_ERROR(" Could not open file '%s'! Error : %s\n", filename.c_str(), strerror(errno));
        fs.close();
        return (false);
    }

    string line;
    vector<string> st;

    while (!fs.eof())
    {
        getline(fs, line);
        // Ignore empty lines
        if (line == "") 
        {
            std::cout << "  this line is empty...." << std::endl;
            continue;
        }

        // Tokenize the line
        boost::trim(line);
        boost::split(st, line, boost::is_any_of("\t\r "), boost::token_compress_on);

        cloud.push_back(PointXYZ(float(atof(st[0].c_str())), float(atof(st[1].c_str())), float(atof(st[2].c_str()))));
    }
    fs.close();
    std::cout<<"    Size of loaded cloud:   " << cloud.size()<<" points" << std::endl;
    cloud.width = uint32_t(cloud.size()); cloud.height = 1; cloud.is_dense = true;
    return (true);
}

当前读取此文件需要很长时间。我想加快这一步的任何想法怎么做？

Answer 1

只要数字始终以三个为一组，您就可以读取数字而不是整行加解析。

@OneToOne

Answer 2

您正在运行优化的代码吗？在我的计算机上，您的代码在1800毫秒内读取了100万个值。

trim和split可能会花费大部分时间。如果字符串trim的开头有空格，则必须复制整个字符串内容以擦除第一个字符。 split正在创建新的字符串副本，您可以通过使用string_view来避免字符串副本来对此进行优化。

由于分隔符是空格，因此可以避免使用以下代码复制所有副本：

bool loadCloud(const string &filename, std::vector<std::array<float, 3>> &cloud)
{
    ifstream fs;
    fs.open(filename.c_str(), ios::binary);
    if (!fs)
    {
        fs.close();
        return false;
    }

    string line;
    vector<string> st;

    while (getline(fs, line))
    {
        // Ignore empty lines
        if (line == "")
        {
            continue;
        }

        const char* first = &line.front();
        const char* last = first + line.length();
        std::array<float, 3> arr;
        for (float& f : arr)
        {
            auto result = std::from_chars(first, last, f);
            if (result.ec != std::errc{})
            {
                return false;
            }
            first = result.ptr;
            while (first != last && isspace(*first))
            {
                first++;
            }
        }
        if (first != last)
        {
            return false;
        }

        cloud.push_back(arr);
    }
    fs.close();
    return true;
}

在我的机器上，此代码运行650毫秒。 getline使用大约35％的时间，解析浮点数的时间占45％，push_back使用剩余的20％的时间。

一些注意事项：

我通过调用while(!fs.eof())后检查流的状态来解决getline的问题
我将结果更改为数组，因为您的示例不是mcve，所以我没有PointCloud或PointXYZ的定义，这些类型可能是您动作缓慢的原因。
如果您事先知道行数（或至少是近似值），那么保留向量的大小将提高性能

读取文件的最快方法C ++

2 个答案: