Question

下面的代码采用关键字和一串文本（清除html标签）并确定关键字是否出现在已清理内容的最后一句中。

有一个我无法想象的故障。当内容的结尾包含空格或带有不间断空格的段落标记时，即

This is the last sentence.<p>&nbsp;</p>

我得到假阴性（不匹配），尽管事实是（1）关键字肯定在最后一句中，（2）strip_tags（）函数应该在最后呈现标签的外观是非问题

有人知道为什么会这样吗？

function plugin_get_kw_last_sentence($post) {
    $theContent = strip_tags(strtolower($post->post_content));
    $theKeyword = 'test';
    $thePiecesByKeyword = plugin_get_chunk_keyword($theKeyword,$theContent);
    if (count($thePiecesByKeyword)>0) {
        $theCount = $thePiecesByKeyword[count($thePiecesByKeyword)-1];
        $theCount = trim($theCount,'.');
        if (substr_count($theCount,'.')>0) {
            return FALSE;
        } else {
            return TRUE;
        }
    }
    return FALSE;
}

function plugin_get_chunk_keyword($theKeyword, $theContent) {
    if (!plugin_get_kw_in_content($theKeyword,$theContent)) {
        return array();
    }

    $myPieceReturn = preg_split('/\b' . $theKeyword . '\b/i', $theContent);
    return $myPieceReturn;
}

Answer 1

如果我理解你的逻辑，那么我认为你可以在正则表达式中完成任务。难道不能将整个逻辑简化为：

function plugin_get_kw_last_sentence($post) {
    $pattern = '/' . $theKeyword . '[^.!?]*[.!?][^.!?]*$/';
    $subject = strip_tags(strtolower($post->post_content));
    return preg_match($pattern, $subject);
}

正则表达式在找到您的关键字时匹配，最后一个句子以标点符号结尾，而其他句子之间没有结束标点符号。

现在这显然不像防御标题（即先生，夫人等）那样是防弹......而其他任何事情，包括这些结束标点符号的句子都会让你失望。这应该可以满足您的需求，因为您的代码也没有考虑到这些情况。

当haystack包含额外标记时，文本解析器会在针上给出假阴性

1 个答案: