"替代品中的广泛特征"来自Lingua的消息:: EN :: Sentence

时间:2016-06-10 14:37:15

标签: perl utf-8

我已经使用Lingua::EN::Sentence一段时间没有问题了。现在突然间,它开始给我带来了广泛的替代品#34;消息如:

  

宽字符(U + 2019)代替(s ///)在C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm第373行,第55块。

知道为什么这些已经开始出现,我怎么能摆脱它们?输入文件设置为:utf8,输出看起来正常。我希望它能够停止发出警告信息。

以下是该计划中的一些代码:

local $/ = undef; 
binmode TXT, ":utf8"; 
my $txtdat = <TXT>;
my @paras = split("\n", $txtdat); 
foreach my $para (@paras) { 
    my $sentences = get_sentences($para);
}

2 个答案:

答案 0 :(得分:3)

I think this must be a bug in Lingua::EN::Sentence

Part of its initialisation sets the locale like this

setlocale(LC_CTYPE, "fr_CA.ISO8859-1");

which is ISO-8859-1-encoded Canadian French. Nothing wrong with that, but it's a strange default

The module also exports a set_locale function, which calls POSIX::setlocale to set the locale to what you say, so in theory you could write

use Lingua::EN::Sentence qw/ get_sentences set_locale /;

set_locale('en_US.UTF-8');

and everything should work

But it doesn't, and I don't have time today to work out why it doesn't

I've been able to reproduce your problem with this

Perl code

use strict;
use warnings 'all';

use Data::Dump 'pp';

use Lingua::EN::Sentence qw/ get_sentences set_locale /;

set_locale('en_US.UTF-8');

my @paragraphs = do {
    open my $fh, '<:encoding(utf-8)', 'unicode.txt';
    <$fh>;
};

printf "Paragraphs: %s\n\n", pp \@paragraphs;

my $n;
for my $para ( @paragraphs ) {
    printf "Sentences in paragraph %d: %s\n\n",
            ++$n,
            pp get_sentences($para);
}

and this

input file

Here's a ‘quoted’ string. (The quotes are "wide" characters.) Another sentence in the same line. And another.

Here's another paragraph. With a second sentence.

(Why did I have to do that? You're in a much better position to recreate the problem with non-sensitive data.)

output

Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 432.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 432.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 433.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 433.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 433.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 371.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 371.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 376.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 376.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 380.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 380.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 395.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 395.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 395.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 422.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 422.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 422.

And, as I say, that should work if I just call set_locale('en_US.UTF-8'). But it doesn't

Even no warnings 'locale' doesn't affect these warnings, and the only way I have found to get this going is to patch the Lingua/EN/Sentence.pm file to comment out use locale from line 194 (I am looking at version 0.29 of the module, which is the latest at the time of writing.)

With that change, I get this

output

Paragraphs: [
  "Here's a \x{2018}quoted\x{2019} string. (The quotes are \"wide\" characters.) Another sentence in the same line. And another.\n",
  "\n",
  "Here's another paragraph. With a second sentence.\n",
  "\n",
  "\n",
  "\n",
]

Sentences in paragraph 1: [
  "Here's a \x{2018}quoted\x{2019} string.",
  "(The quotes are \"wide\" characters.)",
  "Another sentence in the same line.",
  "And another.",
]

Sentences in paragraph 2: undef

Sentences in paragraph 3: ["Here's another paragraph.", "With a second sentence."]

Sentences in paragraph 4: undef

Sentences in paragraph 5: undef

Sentences in paragraph 6: undef

Note that the undef values are correct, and correspond to blank lines in the input

I will raise this with Kim Ryan, the maintainer of this module, and see what she(he?) has to say. In the mean time I hope this helps

答案 1 :(得分:1)

我是这个模块的作者。据我所知,问题与语言环境设置无关。使用set_locale时也会显示错误消息(&#39; en_US.UTF-8&#39;);

问题是你有几个utf8字符(左引号等)与ASCII混合。当模块对文本执行替换功能时,它需要发出警告,表明它正在处理多字节数据。它没有影响结果,它仍然正确地分割句子。

如果您可以将输入数据转换为仅ASCII,那么错误将不存在。这可以通过说出以下内容(以及除了卷曲引号之外的任何其他要映射的字符)来完成。

$ txtdat = ~s / [''] /&#39; / g;

在调用get_sentences之前

解决方法是注释掉使用警告;&#39;在Sentence.pm中排队。我觉得发布theese警告是合理的,因为其他人可能希望在他们的数据流中包含混合编码。