如何删除以小写字母开头的句子?

时间:2010-04-19 00:47:57

标签: python regex perl text

在下面的示例中,使用以下正则表达式(“。*?”)首先删除所有对话。 下一步是删除以小写字母开头的所有剩余句子。 只保留以大写字母开头的句子。

示例:

  韦德惊叹道。事实上,在它们下面是村庄,由木材制成的原始小屋   和石头和泥土。瓦砾工作墙,因为他们在这里需要很少的庇护所   人民只是野蛮人。

     

问阿科特,他的声音因压抑的兴奋而有点不稳定。

     莫雷没有从车站的车站转过身来回答道。现在在他们下面,   在尼罗河山谷错落不到半英里的地方,男人站着,   凝视起来,收集小组,指着奇怪的事情   在他们上空的空气中实现了。

在上面的示例中,只应删除以下内容:

  韦德大声说道   科托特问道,他的声音因压抑的兴奋而有点不稳定   莫雷回答道,没有从车站转过来。

非常感谢有用的正则表达式或简单的Perl或python代码。我正在使用Textpipe的第7版。

感谢。

3 个答案:

答案 0 :(得分:3)

这适用于您发布的示例:

text = re.sub(r'(^|(?<=[.!?])\s+)[a-z].*?[.!?](?=\s|$)', r'\1', text)

答案 1 :(得分:0)

在你的例子中,这适用于Perl:

$s = "exclaimed Wade. Indeed, ...";

do {
  $prev = $s;
  $s =~ s/(^\s*|[.!?]\s+)[a-z][^.!?]*[.!?]\s*/$1/gs;
} until ($s eq $prev);

如果没有do循环,则删除多个连续句子会有问题。

请注意,完美地执行此操作非常AI-complete。 请参阅此问题,了解您永远无法理解的句子类型: LaTeX sometimes puts too much or too little space after periods

当然,您可以使用LaTeX的启发式方法来判断句子结束时间,并在大多数情况下正确使用。

答案 2 :(得分:0)

为什么不使用像Lingua::EN::Sentence这样的模块?它可以很容易地从任意英文文本中获得相当不错的句子。

#!perl

use strict;
use warnings;

use Lingua::EN::Sentence qw( get_sentences );

my $text = <<END;

exclaimed Wade. Indeed, below them were villages, of crude huts made of timber and stone and mud. Rubble work walls, for they needed little shelter here, and the people were but savages.

asked Arcot, his voice a bit unsteady with suppressed excitement.

replied Morey without turning from his station at the window. Below them now, less than half a mile down on the patchwork of the Nile valley, men were standing, staring up, collecting in little groups, gesticulating toward the strange thing that had materialized in the air above them.
END


my $sentences = matching_sentences( qr/^[^a-z]/, $text );

print map "$_\n", @$sentences;

sub matching_sentences {
    my $re   = shift;
    my $text = shift;

    my $s = get_sentences( $text );

    @$s = grep /$re/, @$s;

    return $s;
}

结果:

Indeed, below them were villages, of crude huts made of timber and stone and mud.
Rubble work walls, for they needed little shelter here, and the people were but savages.
Below them now, less than half a mile down on the patchwork of the Nile valley, men were standing, staring up, collecting in little groups, gesticulating toward the strange thing that had materialized in the air above them.