我确信在两个给定名称中找到不同的名称可能有更简单的方法。我有一个脚本,如果有两个不同的样本匹配非常接近,则会对表中的匹配百分比进行着色。
但我想对它们进行微调 - 仅在样本完全不同时才对颜色进行着色。
例如:如果SD0098a与SD0098匹配[b-z]不需要使用其他颜色 但SD0098a与SD0097匹配[a-z]应该给出警报颜色。
我有一个像这样的代码:
my @input = get_data();
@input = @input[28..37] if $ARGV[0] eq 'titchy';
@input = @input[0..49] if $ARGV[0] eq 'small';
my $t0 = time;
my %map;
my $tv = 'A';
$map{$_}=$tv++ foreach qw(N NN A C G T AC AG AT CA CG CT GA GC GT TA TC TG);
my %inv_map = reverse %map;
my $t_store = {};
my $res;
my @index;
my @columns = ( { 'key' => 'code', 'label' => 'Sample name', 'class'=>q({'enc':'[[r:enc]]'}) });
my %col_info;
foreach(@input){
my ($k,@v) = split m{:}mxs;
push @index, $k;
$t_store->{$k} = {
'code' => $k,
'flags' => ( join q(), map { $_ =~ /[ACGT]/ ? q(`) : q( ) } @v ),
"i_$k" => q(-),
'enc' => join q(), map { $map{$_}||'@' } @v,
};
$col_info{$k} = { 'key' => "i_$k", 'label' => $k, 'rotate' => 1,
'header_class' => q({'enc':').$t_store->{$k}{'enc'}.q('}),
'format' => [ [ 'r', 'exact', q(-) ], [ 'p0' ] ],
'class' => "[[r:c_$k]]",
};
}
@index = sort @index;
push @columns, map { $col_info{$_} } @index;
my @t_index = @index;
my $N = 0;
# get the first element of the array sample a.
while( my $a = shift @t_index) {
foreach my $b (@t_index) {
$N++;
my $count_flag = $t_store->{$a}{'flags'} & $t_store->{$b}{'flags'};
my $mismatches = ($t_store->{$a}{'enc'} ^ $t_store->{$b}{'enc'}) | $count_flag;
my $n_match = $mismatches =~ tr{`}{`};
my $n_count = $count_flag =~ tr{`}{`};
my $n_total = length $t_store->{$a}{'flags'};
#$t_store->{$b}{"o_$a"} = $t_store->{$a}{"o_$b"} = $n_total ? $n_count/$n_total : 0;
#$t_store->{$b}{"m_$a"} = $t_store->{$a}{"m_$b"} = $n_count ? 1-$n_match/$n_count : 0;
my $x = $t_store->{$b}{"i_$a"} = $t_store->{$a}{"i_$b"} = $n_count ? $n_match/$n_count : q(-);
# This is where the check should go :
$t_store->{$b}{"c_$a"} = $t_store->{$a}{"c_$b"} = "{'m':$n_match,'i':$n_count,'n':$n_total} gt".
( $x eq '-' || $x < 0.7 ? '' : ' id_'.floor($x*20) );
}
}
sub get_data {
return qw(
SD0098a_SD8r9345843_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:NN:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:NN:TC:TA:G:TC:AG:NN:T:GC
SD0098b_SD8r9345844_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:NN:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:TC:TA:G:TC:AG:NN:T:GC
SD0098c_SD8r9345845_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:C:TA:G:TC:AG:NN:T:NN
SD0097a_SD8r9345842_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:C:CT:TC:GA:C:C:T:C:G:C:AG:C:NN:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC
SD0097b_SD8r9345841_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:NN:CT:TC:GA:C:C:T:C:G:C:AG:C:GA:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC
);
}
所以根据这个例子,如果样本SD0098 [a-c]与任何超过90%的SD0097 [a-b]匹配,我想给出结果百分比一些警告颜色。感谢您的任何提示和建议。
输出将以表格格式显示。它是一个网页输出。(只是模型输出而不是确切的输出):
Sample SD0098a SD0098b SD0098c SD0097a SD0097b
SD0098a - 98% 99% 97% 97%
SD0098b 99% - 95% 97% 99%
SD0098c 97% 97% - 100% 100%
SD0097a 97% 97% 100% - 100%
SD0097b 97% 99% 100% 100% -
答案 0 :(得分:0)
我认为String :: Similarity可以帮助你。如果我理解正确,您需要进行近似字符串匹配