Question

以下是我一直致力于提高速度的代码片段。

use strict;
use warnings;
use Encode;

open(IN,"<utf8",$ARGV[0]) or die "Cannot open $ARGV[0]:$!\n"; ##treat it as a huge data of 35,000 lines in devnagari script.
my @in = <IN>;
close(IN);

my $key = "अच्छा";  #key to be matched contains devanagari script as a string 

foreach my $in(@in) {
    chomp($in);
    $key = decode_utf8($key);
        $in = decode_utf8($in);
    if($key eq $in) {
        print "$key: matched\n";
    }
    else {
        print "Not matched\n";
    }
}

我正在尝试使用密钥匹配文件中的行。通过分析我的代码，我得到了结果。

结果是decode_utf8消耗了34％的时间。由于我的数据是在utf8中，我使用了decode_utf8。

我可以做些什么来提高速度。替换代码中的decode_utf8以匹配unicode数据的任何其他解决方法。

Answer 1

结果是decode_utf8消耗了34％的时间。

好吧，这基本上就是你所有的程序。

更重要的是，您的代码有问题。你正在解码以前解码的字符串！

您在阅读文件时通过:utf8）解码文件的内容，然后解码循环中已解码的内容。
您在每次循环中解码$key的内容，以便在第四次传递时使用decode_utf8(decode_utf8(decode_utf8(decode_utf8($key))))。

修正：

use utf8;                             # Source code encoded using UTF-8.    
use open ':std', ':encoding(UTF-8)';  # Term provides and expects UTF-8. Default for files.

use strict;
use warnings;

my $key = "अच्छा";

my $found = 0;
while (my $line = <>) {
    chomp($line);
    if ($line eq $key) {
        $found = 1;
        last;
    }
}

if ($found) {
    print "Match found\n";
else {
    print "No match\n";
}

这也解决了其他问题：

对输出进行编码（使用use open ':std'）。
不必要地使用全局变量。（使用open my $IN代替open IN。）
不会不必要地将整个文件加载到内存中。
不要不必要地阅读整个文件。
找到密钥时，不会打印Not matched 34,999次。
避免:utf8支持:encoding(UTF-8)。
不会重新发明<>。
不会在行中间隐藏die。（在or die之前设置换行符。）
不使用“不能”。（使用“不能”！）

decode_utf8消耗执行时间

1 个答案: