Question

我想在某些具有 日语的文件中搜索一个单词（在日语中）强>上下文。

我尝试按照普通文件进行操作，但是我收到的错误就像 广告字符在第 - 行语句行 打印一样。

我用过

use Unicode::Japanese; use Unicode::Japanese qw(PurePerl);

如某些网站所述。

这是我正在使用的代码

my $dr="My_Directory" ; opendir DIR, $dr ; my @txtfiles=grep { /\.txt$/ } readdir(DIR) ; foreach $file(@txtfiles) { my $count=0; my @words=(); open(FILE, $dr.$file); while (<FILE>) { push(@words, split(/\s+/)); } foreach $word (@words) { if($word=~ m/$word_to_search/i) { $count++; } } print "$word_to_search occurs $count times in $file file\n"; }

任何想法都会很有帮助。

先谢谢。

PNVR

Answer 1

请先阅读http://p3rl.org/UNI并应用其中的建议。编码的主题已经多次出现在Stack Overflow上，这根本不是针对日语的。（Google，SO tags，SO search）

您提到您将文件保存为UTF-8。为了让您快速入门，这是一种阅读方式：

open my $fh, '<:encoding(UTF-8)', 'filename.txt';

如何使用perl脚本读取具有日语上下文的文件？

1 个答案: