我想解析文字。有一个奇怪的句子,如B R I E F I N G S I N B I O I N F O R M A T I C S
我想跳过这句话。这是代码
<?php
$text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';
$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
foreach(preg_split('/\s+/', $sentence) as $words){
if (count(strlen($words)>1)){
//I don't know what to do
}
}
}
?>
但是,它仍然是错误的,如何识别像B R I E F I N G S I N B I O I N F O R M A T I C S
这样的模式句子?谢谢
答案 0 :(得分:1)
<?php
$text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';
$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
foreach(preg_split('/\s+/', $sentence) as $words){
$isStrange = true;
if (strlen($words)>1){
$isStrange = false;
}
if ($isStrange) echo $sentence.' is very strange!';
}
}
?>
答案 1 :(得分:1)
如果字符串每次都相同
,这将有效<?php
$text = 'B R I E F I N G S I N B I O I N F O R M A T I C S. Because many biomedical entities have multiple names and abbreviations, it would be advantageous to have an automated means to collect these synonyms and abbreviations to aid users doing literature searches.';
$text = str_replace("B R I E F I N G S I N B I O I N F O R M A T I C S. ","",$text); // <--- added this
$reg = '/(?<=[.!?]|[.!?][\'"])\s+/';
foreach(preg_split($reg, $text, -1, PREG_SPLIT_NO_EMPTY) as $sentence){
foreach(preg_split('/\s+/', $sentence) as $words){
if (count(strlen($words)>1)){
//I don't know what to do
}
}
}
?>
答案 2 :(得分:1)
从您所显示的句子中,我会在文字开头删除仅包含空格大写字母的句子:
echo preg_replace('/^[A-Z](?:\s[A-Z])+\./', '', $text);