Question

我正在尝试使用正则表达式从字符串中删除C和C ++样式注释。我找到了一个似乎同时执行这两个操作的Perl：

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//([^\\]|[^\n][\n]?)*?\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#defined $3 ? $3 : ""#gse;

但我不确定如何将其与boost::regex代码块一起使用，或者我需要做些什么来将其转换为boost::regex接受的正则表达式。

仅供参考：我在这里找到了正则表达式：perlfaq6它似乎涵盖了我需要的任何情况。

我不希望使用boost::spirit::qi来执行此操作，因为这会为项目的编译增加大量时间。

编辑：

std::string input = "hello /* world */ world";

boost::regex reg("(/\\*([^*]|(\\*+[^*/]))*\\*+/)|(//.*)");

input = boost::regex_replace(input, reg, "");

因此，较短的正则表达式确实起作用，但是较长的正则表达式确实不起作用。

Answer 1

当boost已经有一个可以用来去除注释的C ++预处理器库（Boost.Wave）时，你会使用正则表达式似乎有点奇怪。

std::string strip_comments(std::string const& input) {
    std::string output;
    typedef boost::wave::cpplexer::lex_token<> token_type;
    typedef boost::wave::cpplexer::lex_iterator<token_type> lexer_type;
    typedef token_type::position_type position_type;

    position_type pos;

    lexer_type it = lexer_type(input.begin(), input.end(), pos, 
        boost::wave::language_support(
            boost::wave::support_cpp|boost::wave::support_option_long_long));
    lexer_type end = lexer_type();

    for (;it != end; ++it) {
        if (*it != boost::wave::T_CCOMMENT
         && *it != boost::wave::T_CPPCOMMENT) {
            output += std::string(it->get_value().begin(), it->get_value().end());
        }
    }
    return output;
}

Answer 2

如果

\*

变为

\\*

然后为什么不

[^\\]

成为

[^\\\\]

使用boost :: regex删除C / C ++样式注释

2 个答案: