Question

之前我在Perl中发布了一个关于不关心符号（X）的问题。我现在有一个工作代码，但这在读取文件时不起作用。

假设我有一个50位的二进制输入和数据库。如果输入与数据库中的数据匹配，我将返回一个预定义的值。

假设数据库中的数据是11001100100010110111110110101001000010110101111101。

如果输入是11XX11001000101101111101101010010000101101011111X1，我想说它是一个匹配的情况，因为X可以是1或0.我知道一种方法在50个1位中分割50位并作出异常，但我更愿意一起处理50位。

在我的代码（dontcare.pl）中，第一部分是使用内部定义的输入和数据库。但是，我想读取包含其他信息的输入文件（input_text.txt）和数据库文件（database.txt）并执行相同的操作。

dontcare.pl：

#!/usr/bin/perl 

####### 1st part, Internal string input and database
my $input = '11XX11001000101101111101101010010000101101011111X1';
( my $mask = $input ) =~ tr/X01/\x00\xFF\xFF/;
( my $targ = $input ) =~ tr/X/\x00/;

for my $num_bin (qw(
   11001100100010110111110110101001000010110101111101
   10101100100010110111110110101001000010110101111101
)) {
   if (($num_bin & $mask) eq $targ) {
      print "$num_bin matches\n";
   } else {
      print "$num_bin doesn't match\n";      
   }
}


####### 2nd part, Reading input and database files 
        print " Reading files\n";      
##### Read input
my @Dinput=do{
    open my $Dfh,"<","input_test.txt" or die("Cannot open an input file $!");
    <$Dfh>;
};

##### Read database
open(CSV,'database.txt')||die("Cannot open db file $!");
my @Ddb;

while(<CSV>){
    my @row=split(/\t/,$_);
    push(@Ddb,\@row);
}
close CSV || die $!;


for (my $n=0; $n < (scalar @Dinput); $n +=1) {

for (my $i=0; $i < (scalar @Ddb); $i +=2) {
    (my $Dmask = $Dinput[$n]) =~ tr/X01/\x00\xFF\xFF/;
    (my $Dtarg = $Dinput[$n]) =~ tr/X/\x00/;

    if (( $Ddb[$i][1] & $Dmask) eq $Dtarg) {
        print "$Ddb[$i][1] matched\n";
    } else {
        print "$Ddb[$i][1] didn't match\n";      
    }
}

}

input_test.txt :(包含两个输入的输入文件）

11XX11001000101101111101101010010000101101011111X1
1000011000111101001011110111001100100101111000010X

database.txt :(数据库文件。第二列中有50位二进制文件。其他信息也在文件中）

0.1 11001100100010110111110110101001000010110101111101  rml_irf_old_e_cwp_e[1]  rml_irf_new_e_cwp_e[1]  rml_irf_swap_even_e rml_irf_old_e_cwp_e[0]  rml_irf_new_e_cwp_e[0]  rml_irf_swap_odd_e
0.1 11101100110010011011001101100111001001100000010011  3.923510310023e-06  3.19470818154393e-08    7.05437377900141e-10    7.05437377900141e-10    4.89200539851702e-17    5.01433479478681e-19
0.1 10000110001111010010111101110011001001011110000100  rml_irf_new_e_cwp_e[1]  rml_irf_new_e_cwp_e[0]
0.1 01110111010010000000101001000001100011011100011111  0.052908822741908   2.7185508579738e-05

我猜这是一个类型转换问题。第一部分有一个字符串输入和字符串数据库，所以它的工作原理。但是，第二部分会自动从文件中读取输入和数据作为整数。我搜索了类型转换并意识到Perl中没有强制转换功能（或者我错了）。请让我知道解决此问题的任何想法和/或建议。

简而言之，我希望与输入和数据库文件的不关心条件匹配。如果您有其他方法可以解决这个问题，请告诉我。（我在输入文件中使用了临时值更改）

Answer 1

嗯，输入 - 不会以你的思维方式存在，因为perl并不关心某些东西是字符串还是数字 - 它根据上下文做正确的事情。

但是，有些pack和unpack会将原始二进制数据转换为更有用的表示形式。例如。从（原始）二进制到十六进制，然后再返回。这些似乎不适用，因为您的输入不是二进制 - 它只是文本。

但我必须说 - 我认为你要比你需要的更难处理（除非我误解了你的问题）并且你实际上并不需要做二进制转换所有：

#!/usr/bin/perl

use warnings;
use strict;

#or read this from a file
my @input = qw ( 11XX11001000101101111101101010010000101101011111X1
                 1000011000111101001011110111001100100101111000010X );
#replace 'X' with '.' which is the regex "don't care" character.                 
s/X/./g for @input;
#compile a regex made of these two patterns. 
my $search = join ( "|", @input );
   $search = qr/$search/; 

print "Compiled input patterns into a regex of: \n";
print $search,"\n";

#iterate database (pasted in 'data' block for illustrative purposes)
while ( <DATA> ) {
    my ( $id, $target, @rest ) = split; #split on whitespace. 
              # you are using tab sep, so you might prefer split /\t/;
    #field 1 = ID
    #field 2 = $target
    #everything else = @rest
    #compare $target with the regex we compiled above, and print the 
    #current line if it matches. 
    print if $target =~ /$search/;
}


__DATA__
0.1 11001100100010110111110110101001000010110101111101  rml_irf_old_e_cwp_e[1]  rml_irf_new_e_cwp_e[1]  rml_irf_swap_even_e rml_irf_old_e_cwp_e[0]  rml_irf_new_e_cwp_e[0]  rml_irf_swap_odd_e
0.1 11101100110010011011001101100111001001100000010011  3.923510310023e-06  3.19470818154393e-08    7.05437377900141e-10    7.05437377900141e-10    4.89200539851702e-17    5.01433479478681e-19
0.1 10000110001111010010111101110011001001011110000100  rml_irf_new_e_cwp_e[1]  rml_irf_new_e_cwp_e[0]
0.1 01110111010010000000101001000001100011011100011111  0.052908822741908   2.7185508579738e-05

然后，对于您的数据库，打印：

0.1 11001100100010110111110110101001000010110101111101  rml_irf_old_e_cwp_e[1]  rml_irf_new_e_cwp_e[1]  rml_irf_swap_even_e rml_irf_old_e_cwp_e[0]  rml_irf_new_e_cwp_e[0]  rml_irf_swap_odd_e
0.1 10000110001111010010111101110011001001011110000100  rml_irf_new_e_cwp_e[1]  rml_irf_new_e_cwp_e[0]

在从特定文件中读取模式方面 - 如果您在阅读时忘记chomp模式，最可能的原因就是破解。

所以你要像这样加载它们（用上面的数据测试）：

#!/usr/bin/perl

use warnings;
use strict;

#Read patterns from file
open ( my $input_fh, '<', 'patterns.txt' ) or die $!; 
chomp ( my @input = <$input_fh> );
close ( $input_fh );
#replace 'X' with '.' which is the regex "don't care" character.                 
s/X/./g for @input;
#compile a regex made of these two patterns. 
my $search = join ( "|", @input );
   $search = qr/$search/; 

#iterate database (pasted in 'data' block for illustrative purposes)
open ( my $data, '<', 'database.txt' ) or die $!;
while ( <$data> ) {
    my ( $id, $target, @rest ) = split;
    #print if the target line matches
    print if $target =~ /$search/;
}

特别是您的代码（以及您的答案）：

启用use strict; use warnings; - 对于故障排除非常重要。
您不需要进行双循环，因为将输入模式转换为替换正则表达式可以为您（更有效）。
始终使用3个arg打开词法文件句柄。 open ( my $input_fh, '<', 'patterns.txt' ) or die $!因为CSV的文件句柄是全局的（当它超出范围时，并不像词法那样自动关闭）。
$i < (scalar @Ddb)是多余的。 <使其成为标量上下文，因此您可以$i < @db得到相同的结果。
perltidy对于代码格式化是一件好事。 perltidy -pbp将根据＆＃34; perl最佳做法＆＃34;进行格式化。

Answer 2

感谢您的帮助 - @Sobrique

我的原始代码使我的代码更复杂。我想要做的实际上是“。”，这是一个不关心的符号和处理这个符号的方式。此外，还需要将csv文件作为输入和数据库进行读取。 @sobrique帮助我解决了所有问题，以下是我的最终代码。

我的代码：

#!/usr/bin/perl 

##### Read input

open my $input_fh, '<', 'input_test.txt' or die $! ; chomp ( my @input = <$input_fh> );

#replace 'X' with '.' which is the regex "don't care" character.                 
s/X/./g for @input;
#compile a regex made of these two patterns. 
#my $search = join ( "|", @input ); 
#   $search = qr/$search/;      
my $search = join ( "|", $input[0] ); 
   $search = qr/$search/;   

##### Read database
open(CSV,'database.txt')||die("Cannot open db file $!");
my @Ddb;
while(<CSV>){
    my @row=split(/\t/,$_);
    push(@Ddb,\@row);
}
close CSV || die $!;


#iterate database (pasted in 'data' block for illustrative purposes)
for (my $n=0; $n < (scalar @input); $n +=2) {

for (my $i=0; $i < (scalar @Ddb); $i +=2) {
    if ($Ddb[$i][1] =~ /$search/) {
        print "$Ddb[$i][1] matched\n";
        print "$Ddb[$i][2] \n";
    } 
#else {
#       print "$Ddb[$i][1] didn't match\n";      
#       }
}

}

input_test.txt：

10001000110010001001110111000011001010110010000011
10111101010011000101001011110000001110101110010011

Perl - 与不关心条件匹配并读取csv文件

2 个答案: