我正试图找到一种基于POS标记来否定句子的方法。请考虑:
include_once 'class.postagger.php';
function negate($sentence) {
$tagger = new PosTagger('includes/lexicon.txt');
$tags = $tagger->tag($sentence);
foreach ($tags as $t) {
$input[] = trim($t['token']) . "/" . trim($t['tag']) . " ";
}
$sentence = implode(" ", $input);
$postagged = $sentence;
// Concatenate "not" to every JJ, RB or VB
// Todo: ignore negative words (not, never, neither)
$sentence = preg_replace("/(\w+)\/(JJ|MD|RB|VB|VBD|VBN)\b/", "not$1/$2", $sentence);
// Remove all POS tags
$sentence = preg_replace("/\/[A-Z$]+/", "", $sentence);
return "$postagged<br>$sentence";
}
BTW:在这个例子中,我使用的是Ian Barber的POS-tagging implementation和lexicon。运行此代码的一个示例是:
echo negate("I will never go to their place again");
I/NN will/MD never/RB go/VB to/TO their/PRP$ place/NN again/RB
I notwill notnever notgo to their place notagain
正如你所看到的,(这个问题在代码中也有注释),否定词语本身也被否定了:never
变成notnever
,这显然不应该发生。由于我的正则表达式技能不是全部,有没有办法从正则表达式中排除这些单词?
[编辑]另外,我非常欢迎你在这个否定实施中可能有的其他评论/批评,因为我确信它(仍然)存在很大缺陷: - )
答案 0 :(得分:3)
尝试一下:
$sentence = preg_replace("/(\s)(?:(?!never|neither|not)(\w*))\/(JJ|MD|RB|VB|VBD|VBN)\b/", "$1not$2", $sentence);