我想比较A列中的字符串与B列中每行的字符串,并打印第3列,突出显示差异。
Column A Column B
uuaaugcuaauugugauaggggu uuaaugcuaauugugauaggggu
uuaaugcuaauugugauagggguu uuaaugcuaauugugauaggggu
uuaaugcuaauugugauagggguuu uuaaugcuaauugugauaggggu
期望的结果:
Column A Column B Column C
uuaaugcuaauugugauaggggu uuaaugcuaauugugauaggggu ********************
uuaaugcuaauugugauagggguu uuaaugcuaauugugauaggggu ********************u
uuaaugcuaauugugauagggguuu uuaaugcuaauugugauaggggu ********************uu
我有一个可能有效的示例脚本,但我该如何为数据框中的每一行执行此操作?
use strict;
use warnings;
my $string1 = 'AAABBBBBCCCCCDDDDD';
my $string2 = 'AEABBBBBCCECCDDDDD';
my $result = '';
for(0 .. length($string1)) {
my $char = substr($string2, $_, 1);
if($char ne substr($string1, $_, 1)) {
$result .= "**$char**";
} else {
$result .= $char;
}
}
print $result;
答案 0 :(得分:2)
使用强力和substr
use strict;
use warnings;
while (<DATA>) {
my ($str1, $str2) = split;
my $len = length $str1 < length $str2 ? length $str1 : length $str2;
for my $i (0..$len-1) {
my $c1 = substr $str1, $i, 1;
my $c2 = substr $str2, $i, 1;
if ($c1 eq $c2) {
substr $str1, $i, 1, '*';
substr $str2, $i, 1, '*';
}
}
printf "%-30s %s\n", $str1, $str2;
}
__DATA__
Column_A Column_B
uuaaugcuaauugugauaggggu uuaaugcuaauugugauaggggu
uuaaugcuaauugugauagggguu uuaaugcuaauugugauaggggu
uuaaugcuaauugugauagggguuu uuaaugcuaauugugauaggggu
AAABBBBBCCCCCDDDDD AEABBBBBCCECCDDDDD
输出:
*******A *******B
*********************** ***********************
***********************u ***********************
***********************uu ***********************
*A********C******* *E********E*******
替代使用XOR
也可以使用^
来查找两个字符串之间的交集。
以下内容与上述内容相同:
while (<DATA>) {
my ($str1, $str2) = split;
my $intersection = $str1 ^ $str2;
while ($intersection =~ /(\0+)/g) {
my $len = length $1;
my $pos = pos($intersection) - $len;
substr $str1, $pos, $len, '*' x $len;
substr $str2, $pos, $len, '*' x $len;
}
printf "%-30s %s\n", $str1, $str2;
}
答案 1 :(得分:0)
我无法抗拒用正则表达式提供改良的米勒解决方案
use strict;
use warnings;
while (<DATA>) {
my $masked_str1 ="";
my $masked_str2 ="";
my ($str1, $str2) = split;
my $intersection = $str1 ^ $str2;
while ($intersection =~ /(\x00+)/g) {
my $mask = $intersection;
$mask =~ s/\x00/1/g;
$mask =~ s/[^1]/0/g;
while ( $mask =~ /\G(.)/gc ) { # traverse the mask
my $bit = $1;
if ( $str1 =~ /\G(.)/gc ) { # traverse the string1 to be masked
$masked_str1 .= $bit ? '_' : $1;
}
if ( $str2 =~ /\G(.)/gc ) { # traverse the string2 to be masked
$masked_str2 .= $bit ? '_' : $1;
}
}
}
print "=" x 80;
printf "\n%-30s %s\n", $str2, $str1; # Minimum length 30 char, left-justified
printf "%-30s %s\n", $str1, $str2;
printf "%-30s %s\n\n", $masked_str1, $masked_str2;
}