我已经使用Lingua::EN::Sentence
一段时间没有问题了。现在突然间,它开始给我带来了广泛的替代品#34;消息如:
宽字符(U + 2019)代替(s ///)在C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm第373行,第55块。
知道为什么这些已经开始出现,我怎么能摆脱它们?输入文件设置为:utf8
,输出看起来正常。我希望它能够停止发出警告信息。
以下是该计划中的一些代码:
local $/ = undef;
binmode TXT, ":utf8";
my $txtdat = <TXT>;
my @paras = split("\n", $txtdat);
foreach my $para (@paras) {
my $sentences = get_sentences($para);
}
答案 0 :(得分:3)
I think this must be a bug in Lingua::EN::Sentence
Part of its initialisation sets the locale like this
setlocale(LC_CTYPE, "fr_CA.ISO8859-1");
which is ISO-8859-1-encoded Canadian French. Nothing wrong with that, but it's a strange default
The module also exports a set_locale
function, which calls POSIX::setlocale
to set the locale to what you say, so in theory you could write
use Lingua::EN::Sentence qw/ get_sentences set_locale /;
set_locale('en_US.UTF-8');
and everything should work
But it doesn't, and I don't have time today to work out why it doesn't
I've been able to reproduce your problem with this
use strict;
use warnings 'all';
use Data::Dump 'pp';
use Lingua::EN::Sentence qw/ get_sentences set_locale /;
set_locale('en_US.UTF-8');
my @paragraphs = do {
open my $fh, '<:encoding(utf-8)', 'unicode.txt';
<$fh>;
};
printf "Paragraphs: %s\n\n", pp \@paragraphs;
my $n;
for my $para ( @paragraphs ) {
printf "Sentences in paragraph %d: %s\n\n",
++$n,
pp get_sentences($para);
}
and this
Here's a ‘quoted’ string. (The quotes are "wide" characters.) Another sentence in the same line. And another.
Here's another paragraph. With a second sentence.
(Why did I have to do that? You're in a much better position to recreate the problem with non-sensitive data.)
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 432.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 432.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 433.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 433.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 433.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 371.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 371.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 373.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 376.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 376.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 380.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 380.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 390.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 391.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 395.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 395.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 395.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 422.
Wide character (U+2018) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 422.
Wide character (U+2019) in substitution (s///) at C:/Strawberry/perl/site/lib/Lingua/EN/Sentence.pm line 422.
And, as I say, that should work if I just call set_locale('en_US.UTF-8')
. But it doesn't
Even no warnings 'locale'
doesn't affect these warnings, and the only way I have found to get this going is to patch the Lingua/EN/Sentence.pm
file to comment out use locale
from line 194 (I am looking at version 0.29 of the module, which is the latest at the time of writing.)
With that change, I get this
Paragraphs: [
"Here's a \x{2018}quoted\x{2019} string. (The quotes are \"wide\" characters.) Another sentence in the same line. And another.\n",
"\n",
"Here's another paragraph. With a second sentence.\n",
"\n",
"\n",
"\n",
]
Sentences in paragraph 1: [
"Here's a \x{2018}quoted\x{2019} string.",
"(The quotes are \"wide\" characters.)",
"Another sentence in the same line.",
"And another.",
]
Sentences in paragraph 2: undef
Sentences in paragraph 3: ["Here's another paragraph.", "With a second sentence."]
Sentences in paragraph 4: undef
Sentences in paragraph 5: undef
Sentences in paragraph 6: undef
Note that the undef
values are correct, and correspond to blank lines in the input
I will raise this with Kim Ryan, the maintainer of this module, and see what she(he?) has to say. In the mean time I hope this helps
答案 1 :(得分:1)
我是这个模块的作者。据我所知,问题与语言环境设置无关。使用set_locale时也会显示错误消息(&#39; en_US.UTF-8&#39;);
问题是你有几个utf8字符(左引号等)与ASCII混合。当模块对文本执行替换功能时,它需要发出警告,表明它正在处理多字节数据。它没有影响结果,它仍然正确地分割句子。
如果您可以将输入数据转换为仅ASCII,那么错误将不存在。这可以通过说出以下内容(以及除了卷曲引号之外的任何其他要映射的字符)来完成。
$ txtdat = ~s / [''] /&#39; / g;
在调用get_sentences之前
解决方法是注释掉使用警告;&#39;在Sentence.pm中排队。我觉得发布theese警告是合理的,因为其他人可能希望在他们的数据流中包含混合编码。