Question

我有一个像

这样的格式的文本文件

ignore contents for about 8 lines
... 
       x        y         z
 - [7.6515, -10.8271, -28.5806, 123.8]
 - [7.6515, -10.8271, -28.5806, 125.0]
 - [7.6515, -10.8271, -28.5806, 125.9]
 - [7.6515, -10.8271, -28.5806, 126.8]
 - [7.6515, -10.8271, -28.5806, 127.9]
 - [7.6515, -10.8271, -28.5806, 128.9]
 - [7.6515, -10.8271, -28.5806, 130.0]
 - [7.6515, -10.8271, -28.5806, 130.9]
 - [7.6515, -10.8271, -28.5806, 131.8]

有没有办法从可能的35000多行获得x，y点，这些行看起来像上面的那些每一行？如果是这样，这是最好的方法吗？

或者，

最好在每一行上使用getline方法，然后使用boost :: regex解析该行吗？

我需要获取x，y点并将它们填充到浮点数组中。

我一直在使用boost :: regex来满足我的需求，但它涉及到我每次都要使用它。我不知道它有多高效，所以我想知道是否有更好的解决方案。如果没有，我可以继续我正在做的事情。

解决方案必须在c ++中完成。

Answer 1

以下是使用Boost Spirit X3和映射文件的内容。

struct Point { double x, y, z; };

template <typename Container>
bool parse(std::string const& fname, Container& into) {
    boost::iostreams::mapped_file mm(fname);

    using namespace boost::spirit::x3;

    return phrase_parse(mm.begin(), mm.end(),
            seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
            ('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
            blank, into);
}

Spirit是一个解析器生成器，因此它会根据表达式为您生成解析代码（例如'x' >> 'y' >> 'z' >> eol以匹配标题行）。

与正则表达式不同，Spirit知道如何处理和转换值，因此您可以使用例如vector<Point>：

int main()
{
    std::vector<Point> v;

    if (parse("input.txt", v)) {
        std::cout << "Parsed " << v.size() << " elements\n";
        for (Point& p : v) {
            std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}\n";
        }
    } else {
        std::cout << "Parse failed\n";
    } 
}

完整演示

此处程序使用嵌入的问题中的示例数据解析本身：

<强> Live On Coliru

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/iostreams/device/mapped_file.hpp>

struct Point { double x, y, z; };

BOOST_FUSION_ADAPT_STRUCT(Point,x,y,z)

template <typename Container>
bool parse(std::string const& fname, Container& into) {
    boost::iostreams::mapped_file mm(fname);

    using namespace boost::spirit::x3;

    return phrase_parse(mm.begin(), mm.end(),
            seek[ eps >> 'x' >> 'y' >> 'z' >> eol ] >> // skip contents for about 8 lines
            ('-' >> ('[' >> double_ >> ',' >> double_ >> ',' >> double_ >> omit[',' >> double_] >> ']')) % eol, // parse points
            blank, into);
}

int main()
{
    std::vector<Point> v;

    if (parse("main.cpp", v)) {
        std::cout << "Parsed " << v.size() << " elements\n";
        for (Point& p : v) {
            std::cout << "{" << p.x << ';' << p.y << ';' << p.z << "}\n";
        }
    } else {
        std::cout << "Parse failed\n";
    } 
}

#if DATA
ignore contents for about 8 lines
... 
       x        y         z
 - [7.6515, -10.8271, -28.5806, 123.8]
 - [7.6515, -10.8271, -28.5806, 125.0]
 - [7.6515, -10.8271, -28.5806, 125.9]
 - [7.6515, -10.8271, -28.5806, 126.8]
 - [7.6515, -10.8271, -28.5806, 127.9]
 - [7.6515, -10.8271, -28.5806, 128.9]
 - [7.6515, -10.8271, -28.5806, 130.0]
 - [7.6515, -10.8271, -28.5806, 130.9]
 - [7.6515, -10.8271, -28.5806, 131.8]
#endif

打印

Parsed 9 elements
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}
{7.6515;-10.8271;-28.5806}

Answer 2

没有人回答，所以我试一试。您没有使用正则表达式发布解决方案，因此我无法比较性能。我推测我的代码可能会快一些。

struct Point
{
    float x;
    float y;
};

void transform_string( std::string& str )
{
    auto i { std::find( std::begin( str ), std::end( str ), '[' ) };
    std::remove( std::begin( str ), i, '-' );
    std::remove_if(
        std::begin( str ),
        std::end( str ),
        [] ( char c )
        {
            return c == ',' || c == '[' || c == ']';
        } );
}

std::istream& get_point( std::istream& in, Point& p )
{
    std::string str;
    std::getline( in, str );
    if ( !str.empty() )
    {
        transform_string( str );
        std::istringstream iss { str };
        iss >> p.x >> p.y;
    }
    return in;
}

代码不言自明（我希望）。它将一行读入字符串，删除阻碍字符并使用std::istringstream来解析浮点数。它仅依赖于标准库，易于阅读，并且其性能足以进行一次操作（在我的笔记本电脑上处理50k行的文件需要约300ms）。它对输入做了一些假设，并没有做验证。您使用get_point与operator >>类似的方式。希望这会有所帮助。

<强> UPD：测试程序：

int main()
{
    std::fstream in_file { "data.txt" };
    std::vector< Point > points;
    // Some code to prepare stream, e.g. skip first 8 lines with
    // std::string tmp; for ( int i = 0; i < 8; ++i ) std::getline( in_file, tmp );
    Point p;
    while ( get_point( in_file, p ) )
        points.emplace_back( p );

    for ( auto& point : points )
        std::cout << point.x << ' ' << point.y << std::endl;
}

我做出的假设：输入流仅包含具有问题中显示的结构的数据。例如，如果有其他字符，空行或其他内容，那么它就不会起作用。如果这个假设不成立，请在问题中说明这一点。

哪种方法可行/更有效

2 个答案:

完整演示