Question

在C代码中，我需要检查模式是否与文件的至少一行匹配。我的表情需要一个插入符号。我所拥有的是：

const char *disambiguate_pp(SourceFile *sourcefile) {
        char *p = ohcount_sourcefile_get_contents(sourcefile);
  char *eof = p + ohcount_sourcefile_get_contents_size(sourcefile);

        /* prepare regular expressions */
        pcre *re;
        const char *error;
        int erroffset;
        re = pcre_compile("^\\s*(define\\s+[\\w:-]+\\s*\\(|class\\s+[\\w:-]+(\\s+inherits\\s+[\\w:-]+)?\\s*{|node\\s+\\'[\\w:\\.-]+\\'\\s*{)",
                          PCRE_MULTILINE, &error, &erroffset, NULL);

        for (; p < eof; p++) {
                if (pcre_exec(re, NULL, p, mystrnlen(p, 100), 0, 0, NULL, 0) > -1)
                        return LANG_PUPPET;

        }
        return LANG_PASCAL;
}

出于某种原因，插入符号似乎被忽略，因为以下行匹配正则表达式（并且不应该）：

  // update the package block define template (the container for all other

我尝试了很多东西，但是无法让它发挥作用。我做错了什么？

Answer 1

如果要使用PCRE_MULTILINE，则将整个缓冲区作为单个字符串传递给它，它会告诉您字符串中的任何位置是否存在匹配项。 for循环不仅是多余的，当你将它传递给缓冲区中间的位置时，它会错误地使PCRE认为它正在查看字符串的开头（因此也是行的开头）。

Answer 2

出于这个原因，您可以使用Mode modifier。在模式开头使用(?m)会将任何插入符号（^）视为换行符。

Dot匹配换行符：将此点设为点匹配任意字符，包括换行符。关闭时，点会匹配除换行符之外的任何字符。这有时候名为“single line mode”并与«(?s)»对应 Perl风格的正则表达式中的模式修饰符。
不区分大小写：默认情况下，正则表达式区分大小写，和“猫”将不匹配“CAT”。打开它以使这些匹配。该相应的模式修饰符是«(?i)»。
^$匹配：关闭时，锚点“^”和 «$»仅匹配字符串的开头和结尾，分别。启用后，它们也会在换行符之前和之后匹配在字符串中（即在行的开头和结尾）。请注意一些缺少此选项的正则表达式，«^»和«$»总是如此在换行符时匹配。此选项有时称为“多行” 模式“及其模式修饰符为«(?m)»。
自由空间：通常，您输入的任何空格和换行符正则表达式按字面意思匹配。在自由间隔模式下，忽略空格，可以用＃开始注释。这是有时称为扩展或扩展模式，通常的修饰符是 «(?x)»。

<强>更新

正确地看着你RegEx我发现模式的末尾有一个缺少的括号，并且有一个Empty alternative。将其更改为：

pcre *myregexp;
const char *error;
int erroroffset;
int offsetcount;
int offsets[(2+1)*3]; // (max_capturing_groups+1)*3
myregexp = pcre_compile("^\\s*(define\\s+[\\w:-]+\\s*\\(class\\s+[\\w:-]+(\\s+inherits\\s+[\\w:-]+)?\\s*\\{|node\\s+\\Z[\\w:.-]+\\Z\\s*\\{)", 0, &error, &erroroffset, NULL);
if (myregexp != NULL) {
    offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, 0, offsets, (2+1)*3);
    while (offsetcount > 0) {
        // match offset = offsets[0];
        // match length = offsets[1] - offsets[0];
        if (pcre_get_substring(subject, &offsets, offsetcount, 0, &result) >= 0) {
            // Do something with match we just stored into result
        }
        offsetcount = pcre_exec(myregexp, NULL, subject, strlen(subject), 0, offsets[1], offsets, (2+1)*3);
    } 
} else {
    // Syntax error in the regular expression at erroroffset
}

希望这有帮助。

Answer 3

为了记录，我使用了@ tripleee的答案，这是我的最终代码：

const char *disambiguate_pp(SourceFile *sourcefile) {
        char *p = ohcount_sourcefile_get_contents(sourcefile);
  char *eof = p + ohcount_sourcefile_get_contents_size(sourcefile);

        /* prepare regular expressions */
        pcre *re;
        const char *error;
        int erroffset;
        re = pcre_compile("^\\s*(define\\s+[\\w:-]+\\s*\\(|class\\s+[\\w:-]+(\\s+inherits\\s+[\\w:-]+)?\\s*{|node\\s+\\'[\\w:\\.-]+\\'\\s*{)",
                          PCRE_MULTILINE, &error, &erroffset, NULL);

        /* regexp for checking for define and class declarations */
        if (pcre_exec(re, NULL, p, mystrnlen(p, 10000), 0, 0, NULL, 0) > -1)            
                return LANG_PUPPET;

        return LANG_PASCAL;
}

使用带有插入符号的PCRE_MULTILINE

3 个答案: