使用正则表达式

时间:2015-11-14 17:36:06

标签: c++ regex boost

问题:查找匹配的字符串并从匹配的字符串中提取数据。有许多命令字符串包含关键字和数据。

命令示例:

  1. 要求姓名给我打电话
  2. 通知执行此操作的名称
  3. 请求的消息名称
  4. 关键字:询问,通知,消息,来,那。数据:

    输入字符串:

    1. 让彼得给我打电话
    2. 通知Jenna我将要离开
    3. 留言说我迟到了
    4. 我的问题包括两个问题 1)找到匹配的命令 2)提取数据

      这是我正在做的事情: 我创建了多个正则表达式: “问[[:s:]] [[:w:]] + [[:s:]]到[[:s:]] [[:w:]] +”或“问([^ \ t \ N] +α)([^ \吨\ n] +?)” “通知[[:s:]] [[:w:]] + [[:s:]] [[:s:]] [[:w:]] +”或“通知([^ \ t \] N] +)指出([^ \吨\ n] +?)“

      void searchExpression(const char *regString)
      {
          std::string str;
          boost::regex callRegEx(regString, boost::regex_constants::icase);
          boost::cmatch im;
      
          while(true) {
             std::cout << "Enter String: ";
             getline(std::cin, str);
             fprintf(stderr, "str %s regstring %s\n", str.c_str(), regString);
      
             if(boost::regex_search(str.c_str(), im, callRegEx)) {
                   int num_var = im.size() + 1;
                   fprintf(stderr, "Matched num_var %d\n", num_var);
                   for(int j = 0; j <= num_var; j++) {
                          fprintf(stderr, "%d) Found %s\n",j, std::string(im[j]).c_str());
                   }
            }
            else {
                fprintf(stderr, "Not Matched\n");
            }
         }
      }
      

      我能够找到匹配的字符串,我无法提取数据。 这是输出:

      input_string: Ask peter to call Regex Ask[[:s:]][[:w:]]+[[:s:]]to[[:s:]][[:w:]]+
      Matched num_var 2
      0) Found Ask peter to call
      1) Found
      2) Found
      

      我想提起彼得并打电话给彼得打电话。

2 个答案:

答案 0 :(得分:4)

由于你真的想要解析语法,你应该考虑Boost的解析器生成器。

你只需要自上而下地编写整个内容:

auto sentence  = [](auto&& v, auto&& p) { 
    auto verb     = lexeme [ no_case [  as_parser(v) ] ];
    auto name     = lexeme [ +graph ];
    auto particle = lexeme [ no_case [  as_parser(p) ] ];
    return confix(verb, particle) [ name ]; 
};

auto ask     = sentence("ask",     "to")   >> lexeme[+char_];
auto notify  = sentence("notify",  "that") >> lexeme[+char_];
auto message = sentence("message", "that") >> lexeme[+char_];

auto command = ask | notify | message;

这是Spirit X3的语法。将lexeme读为“保持整个单词”(不要忽略空格)。

此处,“名称”被视为任何符合预期粒子的内容

如果您只想返回匹配的原始字符串,这就足够了:

<强> Live On Coliru

#include <iostream>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>

namespace x3 = boost::spirit::x3;

namespace commands {
    namespace grammar {
        using namespace x3;

        auto sentence  = [](auto&& v, auto&& p) { 
            auto verb     = lexeme [ no_case [  as_parser(v) ] ];
            auto name     = lexeme [ +graph ];
            auto particle = lexeme [ no_case [  as_parser(p) ] ];
            return confix(verb, particle) [ name ]; 
        };

        auto ask     = sentence("ask",     "to")   >> lexeme[+char_];
        auto notify  = sentence("notify",  "that") >> lexeme[+char_];
        auto message = sentence("message", "that") >> lexeme[+char_];

        auto command = ask | notify | message;

        auto parser  = raw [ skip(space) [ command ] ];
    }
}

int main() {
    for (std::string const input : {
            "Ask peter to call me",
            "Notify Jenna that I am going to be away",
            "Message home that I am running late",
            })
    {
        std::string matched;

        if (parse(input.begin(), input.end(), commands::grammar::parser, matched))
            std::cout << "Matched: '" << matched << "'\n";
        else
            std::cout << "No match in '" << input << "'\n";
    }

}

打印:

Matched: 'Ask peter to call me'
Matched: 'Notify Jenna that I am going to be away'
Matched: 'Message home that I am running late'

奖金

当然,您实际上想要提取相关信息。

以下是我。让我们解析一个结构:

struct Command {
    enum class Type { ask, message, notify } type;
    std::string name;
    std::string message;
};

让我们将main()写为:

commands::Command cmd;

if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
    std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
else
    std::cout << "No match in '" << input << "'\n";

<强> Live On Coliru

#include <iostream>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/directive/confix.hpp>

namespace x3 = boost::spirit::x3;

namespace commands {

    struct Command {
        enum class Type { ask, message, notify } type;
        std::string name;
        std::string message;

        friend std::ostream& operator<<(std::ostream& os, Type t) { return os << static_cast<int>(t); } // TODO
    };

}

BOOST_FUSION_ADAPT_STRUCT(commands::Command, type, name, message)

namespace commands {

    namespace grammar {
        using namespace x3;

        auto sentence  = [](auto type, auto&& v, auto&& p) { 
            auto verb     = lexeme [ no_case [  as_parser(v) ] ];
            auto name     = lexeme [ +graph ];
            auto particle = lexeme [ no_case [  as_parser(p) ] ];
            return attr(type) >> confix(verb, particle) [ name ]; 
        };

        using Type = Command::Type;
        auto ask     = sentence(Type::ask,     "ask",     "to")   >> lexeme[+char_];
        auto notify  = sentence(Type::notify,  "notify",  "that") >> lexeme[+char_];
        auto message = sentence(Type::message, "message", "that") >> lexeme[+char_];

        auto command // = rule<struct command, Command> { }
                     = ask | notify | message;

        auto parser  = skip(space) [ command ];
    }
}

int main() {
    for (std::string const input : {
            "Ask peter to call me",
            "Notify Jenna that I am going to be away",
            "Message home that I am running late",
            })
    {
        commands::Command cmd;

        if (parse(input.begin(), input.end(), commands::grammar::parser, cmd))
            std::cout << "Matched: " << cmd.type << "|" << cmd.name << "|" << cmd.message << "\n";
        else
            std::cout << "No match in '" << input << "'\n";
    }

}

打印

Matched: 0|peter|call me
Matched: 2|Jenna|I am going to be away
Matched: 1|home|I am running late

¹我不是英语语言学家,所以我不知道这是否是正确的语法术语:)

答案 1 :(得分:2)

此代码从文件&#34; commands.txt&#34;中读取命令字符串,搜索正则表达式并在匹配时打印部件。

#include <iostream>
#include <fstream> 
#include <string>
#include <boost/regex.hpp>

const int NumCmdParts = 4;
std::string CommandPartIds[] = {"Verb", "Name", "Preposition", "Content"};

int main(int argc, char *argv[])
{

    std::ifstream ifs;
    ifs.open ("commands.txt", std::ifstream::in);
    if (!ifs.is_open()) {
      std::cout << "Error opening file commands.txt" << std::endl;
      exit(1);
    }

    std::string cmdStr;

    // Pieces of regular expression pattern
    // '(?<Verb>' : This is to name the capture group as 'Verb'
    std::string VerbPat = "(?<Verb>(Ask)|(Notify|Message))";
    std::string SeparatorPat = "\\s*";  
    std::string NamePat = "(?<Name>\\w+)";

    // Conditional expression. if (Ask) (to) else (that)
    std::string PrepositionPat = "(?<Preposition>(?(2)(to)|(that)))";
    std::string ContentPat = "(?<Content>.*)";

    // Put the pieces together to compose pattern
    std::string TotalPat = VerbPat + SeparatorPat + NamePat + SeparatorPat
                            + PrepositionPat + SeparatorPat + ContentPat;

    boost::regex actions_re(TotalPat);
    boost::smatch action_match;

    while (getline(ifs, cmdStr)) {
        bool IsMatch = boost::regex_search(cmdStr, action_match, actions_re);
        if (IsMatch) {          
          for (int i=1; i <= NumCmdParts; i++) {     
            std::cout << CommandPartIds[i-1] << ": " << action_match[CommandPartIds[i-1]] << "\n";
          }
        }
    }   

    ifs.close();
}