正则表达式找到复杂的参数

时间:2014-05-28 08:10:24

标签: c++ regex boost boost-regex

我正在尝试使用以下格式从字符串中查找所有参数值:

pN  stands for the Nth parameter: it can be composed of the following chars:
    letters, numbers, and any char included in kSuportedNamesCharsRegEx
vNX for the the Xnt component of the value of the Nth parameter
    vNX accepts arithmetical expressions. Therefore I have constructed kSuportedValuesCharsRegEx. Additionally, it could allow simple/nested list as the value.

以下是要解析的字符串的示例

p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5

我应该获得“p1”,“p2 =(v21 + v22)”,“p3 = v31-v32”,“p4”,“p5 = v5”

可以看出,参数可能具有或不具有值。 我正在使用c ++ boost库(所以我认为我没有可用的外观)。 到现在为止,我只需要处理有价值的参数,所以我一直在使用以下内容:

static const std::string kSpecialCharsRegEx = "\\.\\{\\}\\(\\)\\\\\\*\\-\\+\\?\\|\\^\\$";
static const std::string kSuportedNamesCharsRegEx = "[A-Za-z0-9çÇñÑáÁéÉíÍóÓúÚ@%_:;,<>/"
    + kSpecialCharsRegEx + "]+";
static const std::string kSuportedValuesCharsRegEx   = "([\\s\"A-Za-z0-9çÇñÑáÁéÉíÍóÓúÚ@%_:;,<>/"
    + kSpecialCharsRegEx + "]|(==)|(>=)|(<=))+";
static const std::string kSimpleListRegEx    = "\\[" + kSuportedValuesCharsRegEx + "\\]";
static const std::string kDeepListRegEx  = "\\[(" + kSuportedValuesCharsRegEx + "|(" + kSimpleListRegEx + "))+\\]";
// Main idea
//static const std::string stackRegex = "\\w+\\s*=\\s*[\\w\\s]+(?=\\s+\\w+=)"
//          "|\\w+\\s*=\\s*[\\w\\s]+(?!\\w+=)"
//          "|\\w+\\s*=\\s*\\[[\\w\\s]+\\]";
// + deep listing support

    // Main regex
static const std::string kParameterRegEx = 
    + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*" + kSuportedValuesCharsRegEx + "(?=\\s+\\b" + kSuportedNamesCharsRegEx + "\\b=)"
    + "|"
    + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*" + kSuportedValuesCharsRegEx +"(?!" + kSuportedNamesCharsRegEx + "=)"
    + "|"
    + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*(" + kDeepListRegEx + ")";

但是,现在我需要处理非值参数,我在创建正确的正则表达式时遇到了麻烦。

有人可以帮我解决这个问题吗?提前致谢

2 个答案:

答案 0 :(得分:2)

像mkaes建议的那样,你只需要在这里设计一个简单的语法。这是精神方法:

op         = char_("-+/*");

name       = +(graph - '='); // excluding `op` is not even necessary here

simple     = +(graph - op);

expression = raw [
             '(' >> expression >> ')'
            | simple >> *(op >> expression)
            ];

value      = expression;

definition = name >> - ('=' > value);
start      = *definition;

查看 Live On Coliru

raw[]就在那里,所以我们可以忽略整个表达式结构以进行标记化/验证。我只是接受了名字的所有非空格,除了操作符。

使用它像:

int main()
{
    using It = std::string::const_iterator;
    std::string const input = "p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5";
    It first(input.begin()), last(input.end());

    Definitions defs;
    if (qi::phrase_parse(first, last, grammar<It>(), qi::space, defs))
    {
        std::cout << "Parsed " << defs.size() << " definitions\n";
        for (auto const& def : defs)
        {
            std::cout << def.name;
            if (def.value)
                std::cout << " with value expression '" << *def.value << "'\n";
            else
                std::cout << " with no value expression\n";
        }
    } else
    {
        std::cout << "Parse failed\n";
    }

    if (first != last)
        std::cout << "Remaining unparsed input: '" << std::string(first,last) << "'\n";
}

打印:

Parsed 5 definitions
p1 with no value expression
p2 with value expression '(v21 +  v22)'
p3 with value expression 'v31-v32'
p4 with no value expression
p5 with value expression 'v5'

完整代码供参考

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

struct Definition {
    std::string name;
    boost::optional<std::string> value;
};

BOOST_FUSION_ADAPT_STRUCT(Definition, (std::string, name)(boost::optional<std::string>, value))

using Definitions = std::vector<Definition>;

template <typename Iterator, typename Skipper = qi::space_type>
struct grammar : qi::grammar<Iterator, Definitions(), Skipper>
{
    grammar() : grammar::base_type(start) {
        using namespace qi;

        name       = +(graph - '=');

        simple     = name;

        expression = raw [
                '(' >> expression >> ')'
              | simple >> *(char_("+-/*") >> expression)
              ];

        value      = expression;

        definition = name >> - ('=' > value);
        start      = *definition;
    }
  private:
    qi::rule<Iterator> simple;
    qi::rule<Iterator, std::string(), Skipper> expression, value;
    qi::rule<Iterator, std::string()/*no skipper*/> name;
    qi::rule<Iterator, Definition(),  Skipper> definition;
    qi::rule<Iterator, Definitions(), Skipper> start;
};

int main()
{
    using It = std::string::const_iterator;
    std::string const input = "p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5";
    It f(input.begin()), l(input.end());

    Definitions defs;
    if (qi::phrase_parse(f, l, grammar<It>(), qi::space, defs))
    {
        std::cout << "Parsed " << defs.size() << " definitions\n";
        for (auto const& def : defs)
        {
            std::cout << def.name;
            if (def.value)
                std::cout << " with value expression '" << *def.value << "'\n";
            else
                std::cout << " with no value expression\n";
        }
    } else
    {
        std::cout << "Parse failed\n";
    }

    if (f != l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}

答案 1 :(得分:0)

我想我找到了问题的解决方案。 与我的同事一起工作。

主要思想包含在以下示例中: http://regexr.com/38tjv

正则表达式:

(?:^|\s)(\b[a-zA-Z0-9]+\b|\b[a-zA-Z0-9]+\b\s*=\s*\b[a-zA-Z0-9\s\+\(\)]+?\b)(?=\s+\b[a-zA-Z0-9]+\b\s*=|\s*$|\s+\b[a-zA-Z0-9]+\b)

以下是解释:

    static const std::string kParameterRegEx = "(?:^|\\s)"                                                  // starts string or space before, not catched
        + "("                                                                                               // group of the parameter or parameter-value
            + "\\b" + kSuportedNamesCharsRegEx + "\\b"                                                      //      simple names
            + "|"                                                                                           //      or
            + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*\\b" + kSuportedValuesCharsRegEx + "?\\b"     //      name-value
        + ")"                                                                                               // end group
        + "(?="                                                                                             // followed by group of
            + "\\s+\\b" + kSuportedNamesCharsRegEx + "\\b\\s*="                                             //      new parameter with value
            + "|"                                                                                           //      or
            + "\\s*$"                                                                                       //      end of string
            + "\\s+\\b" + kSuportedNamesCharsRegEx + "\\b"                                                  //      new parameter without value
        + ")";                                                                                              // end of following group

我希望对其他需要解析Cadence Spectre电路的人有所帮助。