Question

我alredy解析文件并将内容拆分为枚举或枚举类。

git clone

如果枚举没有评论就可以了。

我尝试添加命名组std::string sourceString = readFromFile(typesHDestination); boost::smatch xResults; std::string::const_iterator Start = sourceString.cbegin(); std::string::const_iterator End = sourceString.cend(); while (boost::regex_search(Start, End, xResults, boost::regex("(?<data_type>enum|enum\\s+class)\\s+(?<enum_name>\\w+)\\s*\{(?<content>[^\}]+?)\\s*\}\\s*"))) { std::cout << xResults["data_type"] << " " << xResults["enum_name"] << "\n{\n"; std::string::const_iterator ContentStart = xResults["content"].begin(); std::string::const_iterator ContentEnd = xResults["content"].end(); boost::smatch xResultsInner; while (boost::regex_search(ContentStart, ContentEnd, xResultsInner, boost::regex("(?<name>\\w+)(?:(?:\\s*=\\s*(?<value>[^\,\\s]+)(?:(?:,)|(?:\\s*)))|(?:(?:\\s*)|(?:,)))"))) { std::cout << xResultsInner["name"] << ": " << xResultsInner["value"] << std::endl; ContentStart = xResultsInner[0].second; } Start = xResults[0].second; std::cout << "}\n"; }以在枚举中保存评论，但每次都失败。 <comment> - 带有双斜杠的评论示例。

我使用此online regex和boost :: regex进行了测试。

第一步 - 从* .cpp文件到(\/{2}\s*.+) 正则表达式：

（＆＃39; DATA_TYPE＆＃39;？枚举|枚举\ S +类）（？＆＃39; enum_name＆＃39; \ W +）？\ S + \ S * {\ S *（＆＃39;内容＆＃39; [^}] +）\ S *} \ S *

从<data_type> <enum_name> <content>到<content> 正则表达式：

（（＆＃39;名称＆＃39 \ W +）：？（：\ S * = \ S *（＆＃39;值＆＃39; [^ \，\ S /] +）（？？（？：，）：|（：\ S *？）））|（？：？（：\ S *）|（？：，）））

最后一个包含错误。有没有办法解决它并添加功能来存储组中的coments？

Answer 1

正如一些评论所说，使用正则表达式解析源文件可能不是一个好主意，除非有一些简单的情况

例如此源文件，来自：http://en.cppreference.com/w/cpp/language/enum

#include <iostream>

// enum that takes 16 bits
enum smallenum: int16_t
{
    a,
    b,
    c
};


// color may be red (value 0), yellow (value 1), green (value 20), or blue (value 21)
enum color
{
    red,
    yellow,
    green = 20,
    blue
};

// altitude may be altitude::high or altitude::low
enum class altitude: char
{ 
     high='h',
     low='l', // C++11 allows the extra comma
}; 

// the constant d is 0, the constant e is 1, the constant f is 3
enum
{
    d,
    e,
    f = e + 2
};

//enumeration types (both scoped and unscoped) can have overloaded operators
std::ostream& operator<<(std::ostream& os, color c)
{
    switch(c)
    {
        case red   : os << "red";    break;
        case yellow: os << "yellow"; break;
        case green : os << "green";  break;
        case blue  : os << "blue";   break;
        default    : os.setstate(std::ios_base::failbit);
    }
    return os;
}

std::ostream& operator<<(std::ostream& os, altitude al)
{
    return os << static_cast<char>(al);
}

int main()
{
    color col = red;
    altitude a;
    a = altitude::low;

    std::cout << "col = " << col << '\n'
              << "a = "   << a   << '\n'
              << "f = "   << f   << '\n';
}

这里的关键模式是：从enum开始，以;结束，您无法预测enum和;之间的任何文字会有这么多的可能性！为此你可以使用.*?懒星

因此，如果我要提取所有enums我使用：

注意：这不是有效的方式

boost::regex rx( "^\\s*(enum.*?;)" );

boost::match_results< std::string::const_iterator > mr; // or boost::smatch


std::ifstream ifs( "file.cpp" );
const uintmax_t file_size = ifs.seekg( 0, std::ios_base::end ).tellg();
                            ifs.seekg( 0, std::ios_base::beg );   // rewind

std::string whole_file( file_size, ' ' );
ifs.read( &*whole_file.begin(), file_size );
ifs.close();

while( boost::regex_search( whole_file, mr, rx ) ){
    std::cout << mr.str( 1 ) << '\n';
    whole_file = mr.suffix().str();
}

输出将是：

enum smallenum: int16_t
{
    a,
    b,
    c
};
enum color
{
    red,
    yellow,
    green = 20,
    blue
};
enum class altitude: char
{
     high='h',
     low='l', // C++11 allows the extra comma
};
enum
{
    d,
    e,
    f = e + 2
};

当然，对于这样简单的事情，我更喜欢使用：

perl -lne '$/=unlef;print $1 while/^\s*(enum.*?;)/smg' file.cpp

具有相同的输出。

如果你想分别匹配每个部分，这种模式可能会有所帮助

`^\s(enum[^{])\s({)\s([^}]+)\s*(};)`

但是除了一些简单的源文件之外，这不是一个好主意。由于 C ++源代码具有自由样式，并非所有代码编写者都遵循标准规则。例如，对于上面的模式，我假设(};) }附带;，如果有人将它们分开（仍然是有效代码），则模式将无法匹配。< / p>

Answer 2

我认为使用正则表达式来解析复杂数据并不是最好的解决方案。我忽略了几个主要条件。首先，我解析了一些包含emuns和enum类的生成源代码。所以代码中没有任何惊喜，而且代码是常规的。所以我用正则表达式解析常规代码。

答案： （第一步是相同的，第二步是固定的）如何用正则表达式解析枚举/ emun类：

第一步 - 从* .cpp文件到<data_type> <enum_name> <content>正则表达式：

（？ 'data_type'enum |枚举\ S +类）\ S +（？' '{（内容'[^] +？\ S *} \ enum_name \ S * \ S *}？）\ W +）' S *

从<content>到<name> <value> <comment>正则表达式：

？？？？
^ \ S *（？ '名称' \ W +）（：（：\ S * = \ S *（ '值'[^ \ N /] +））|（？：[^， \ S /] ））（：（：？\ S $）|（？：\ S *，\ S * $）|（？：[^ /]的 / {2 } \ S 的（？ '注释'。* $）））

所有测试都没问题，这里按颜色标记文字。

使用boost :: regex解析包含枚举的* .cpp文件。

2 个答案:

`^\s(enum[^{])\s({)\s([^}]+)\s*(};)`

使用boost :: regex解析包含枚举的* .cpp文件。

2 个答案:

^\s*(enum[^{]*)\s*({)\s*([^}]+)\s*(};)

`^\s(enum[^{])\s({)\s([^}]+)\s*(};)`