Question

我在perl中的正则表达式模式正确匹配区分大小写的字符串，但不匹配大小写不同的字符串。我正在解析一个CSV文件，其中第一行是国家/地区名称，其他行是该国家/地区的缩写或其他常见拼写。

示例：CSV的第1列是美国，美国，美国，美国。第2列是：墨西哥，MX，MEX。

这是完整的代码::

    #!/usr/bin/perl

use strict;
use warnings;
use Data::Dumper qw(Dumper);

my $filename = 'countrycodes.csv';
my $line;
my @rowStrings;
my @rows;
my @columns;

这是我用来测试代码的字符串：

my $string = "Mex, MEX, USA, usa, US, MX, CAN, Canada";

open(my $fh, '<', $filename) or die "Can't open $filename: $!";

$line = <$fh>;
@rowStrings = split("\r", $line);

#make rows strings into arrays
foreach my $i (0..$#rowStrings){
    $rows[$i] = [split(",",$rowStrings[$i])];
}


my $columnCount = values scalar $rows[0];

print "column count: $columnCount \n";

#create array for each column from CSV
foreach my $column (0..$columnCount){
    foreach my $row (0..$#rows){
        $columns[$column][$row] =  $rows[$row][$column];
        if ($columns[$column][$row]) {
        }
    }

}

在这里，我将浏览缩写/拼写数组并寻找匹配项。从数组中搜索任何缩写，并用CSV文件（$ head）中的标题/国家/地区名称替换它们。

for my $col (0..$#columns-1){
    my $head = $columns[$col][0];
    for my $ro (1..$#rows){
        if ($columns[$col][$ro]){
            $string =~ s/\s$columns[$col][$ro],/ $head,/i;
            print $string . "\n";
        }
    }

}

这是终端输出作为最终结果：

Mex, Mexico, United States, usa, United States, Mexico, Canada, Canada

正如您所看到的，MEX正确匹配，因为它是它正在搜索的术语，但不是Mex，即使我使用的是/ i修饰符。我做错了什么？

编辑：美国是匹配，机器人不是美国。

作为参考，正则表达式模式为$string =~ s/\s$columns[$col][$ro],/ $head,/i

谢谢！

Answer 1

我不完全明白你正在做什么，但也许这会有所帮助：你的正则表达式中的\ s试图匹配空格，但不会匹配空格的缺失。因为你的＆＃34; Mex＆＃34;在行的开头，它前面没有空格。作为一项实验，尝试移动＆＃34; Mex＆＃34;在线上的不同位置。

Answer 2

似乎解析CSV不是你的问题。（我仍然建议Text::CSV。）

假设您在阵列中拥有自己的语言和替代品，并且您拥有这些语言与替代数组的数组，您可以只比较输入。您应该删除前导和尾随空格，并比较不区分大小写，但您不需要正则表达式：

#!/usr/bin/perl
use strict;
use warnings;

my @countries = (   
    ['United States of America', 'US', 'USA', 'US of A', 'United States'],
    ['Mexico', 'MX', 'Mex'], 
);

my @input = ('US ', '  mx   ', ' Mexico', ' us of a');

foreach my $input (@input) {  
    $input =~ s/^\s+//;
    $input =~ s/\s+$//; 
    my $found = 0;
    foreach my $country (@countries) {  
        foreach my $alternative (@$country) {
            if (lc($input) eq lc($alternative)) {  
                print "$input is ${$country}[0]\n";
                $found = 1;
            }
        } 
    }   
    print "did not find $input\n" unless($found);
}

Answer 3

问题在于我没有包含“g”运算符，这意味着一旦找到国家名称替代的一个实例，它就会停止寻找其他运算符。

将$string =~ s/\s$columns[$col][$ro],/ $head,/i更改为$string =~ s/\s$columns[$col][$ro],/ $head,/ig，匹配正确无误。

不区分大小写的正则表达式匹配在perl中不起作用

3 个答案: