在解析文本时跳过某个句子

时间:2012-11-01 09:42:33

标签: php

我想解析文字。有一个奇怪的句子,如B R I E F I N G S I N B I O I N F O R M A T I C S我想跳过这句话。这是代码

<?php
$text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';

$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
    foreach(preg_split('/\s+/', $sentence) as $words){
       if (count(strlen($words)>1)){
        //I don't know what to do
    }
    }
}
?>

但是,它仍然是错误的,如何识别像B R I E F I N G S I N B I O I N F O R M A T I C S这样的模式句子?谢谢

3 个答案:

答案 0 :(得分:1)

这是怎么回事?如果句子中所有单词的长度等于1,则此工作。

   <?php
    $text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';

$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
    foreach(preg_split('/\s+/', $sentence) as $words){
       $isStrange = true;
       if (strlen($words)>1){
        $isStrange = false;
    }
    if ($isStrange) echo $sentence.' is very strange!';
    }
}
?>

答案 1 :(得分:1)

如果字符串每次都相同

,这将有效
<?php
$text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';

$text = str_replace("B R I E F I N G S I N B I O I N F O R M A T I C S. ","",$text); // <--- added this

$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
    foreach(preg_split('/\s+/', $sentence) as $words){
       if (count(strlen($words)>1)){
        //I don't know what to do
    }
    }
}
?>

答案 2 :(得分:1)

从您所显示的句子中,我会在文字开头删除仅包含空格大写字母的句子:

echo preg_replace('/^[A-Z](?:\s[A-Z])+\./', '', $text);