Perl找到两个名字的区别

时间:2013-01-15 11:48:18

标签: perl

我确信在两个给定名称中找到不同的名称可能有更简单的方法。我有一个脚本,如果有两个不同的样本匹配非常接近,则会对表中的匹配百分比进行着色。

但我想对它们进行微调 - 仅在样本完全不同时才对颜色进行着色。

例如:如果SD0098a与SD0098匹配[b-z]不需要使用其他颜色 但SD0098a与SD0097匹配[a-z]应该给出警报颜色。

我有一个像这样的代码:

    my @input = get_data();
    @input = @input[28..37] if $ARGV[0] eq 'titchy';
    @input = @input[0..49]  if $ARGV[0] eq 'small';
    my $t0 = time;
    my %map;
    my $tv  = 'A';
    $map{$_}=$tv++ foreach qw(N NN A C G T AC AG AT CA CG CT GA GC GT TA TC TG);
    my %inv_map = reverse %map;
    my $t_store = {};
    my $res;

    my @index;
    my @columns = ( { 'key'   => 'code', 'label' => 'Sample name', 'class'=>q({'enc':'[[r:enc]]'}) });

    my %col_info;
    foreach(@input){
      my ($k,@v) = split m{:}mxs;
      push @index, $k;

      $t_store->{$k} = {
        'code'  => $k,
        'flags' => ( join q(), map { $_ =~ /[ACGT]/ ? q(`) : q( ) } @v ),
        "i_$k"  => q(-),
        'enc'   => join q(), map { $map{$_}||'@' } @v,
      };

      $col_info{$k} = { 'key' => "i_$k", 'label' => $k, 'rotate' => 1,
        'header_class' => q({'enc':').$t_store->{$k}{'enc'}.q('}),
        'format' => [ [ 'r', 'exact', q(-) ], [ 'p0' ] ],
        'class'  => "[[r:c_$k]]",
      };
    }

    @index = sort @index;

    push @columns, map { $col_info{$_} } @index;

    my @t_index = @index;

    my $N = 0;
# get the first element of the array sample a.
    while( my $a = shift @t_index) {
      foreach my $b (@t_index) {
        $N++;
        my $count_flag = $t_store->{$a}{'flags'} & $t_store->{$b}{'flags'};
        my $mismatches = ($t_store->{$a}{'enc'} ^ $t_store->{$b}{'enc'}) | $count_flag;

        my $n_match = $mismatches =~ tr{`}{`};
        my $n_count = $count_flag =~ tr{`}{`};
        my $n_total = length $t_store->{$a}{'flags'};

        #$t_store->{$b}{"o_$a"} = $t_store->{$a}{"o_$b"} = $n_total ? $n_count/$n_total   : 0;
        #$t_store->{$b}{"m_$a"} = $t_store->{$a}{"m_$b"} = $n_count ? 1-$n_match/$n_count : 0;
        my $x = $t_store->{$b}{"i_$a"} = $t_store->{$a}{"i_$b"} = $n_count ? $n_match/$n_count   : q(-);
# This is where the check should go :
        $t_store->{$b}{"c_$a"} = $t_store->{$a}{"c_$b"} = "{'m':$n_match,'i':$n_count,'n':$n_total} gt".
          ( $x eq '-' || $x < 0.7 ? '' : ' id_'.floor($x*20) );
      }
    }

    sub get_data { 
      return  qw(
    SD0098a_SD8r9345843_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:NN:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:NN:TC:TA:G:TC:AG:NN:T:GC
    SD0098b_SD8r9345844_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:NN:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:TC:TA:G:TC:AG:NN:T:GC
    SD0098c_SD8r9345845_07-APR-13:T:C:AG:GA:C:A:AG:CA:A:CT:T:GT:AG:G:G:TC:TC:T:A:GA:T:T:GA:T:A:G:CT:CT:GT:CT:A:TC:CT:AG:TC:GA:T:T:T:G:TC:AG:C:T:G:CT:G:CA:C:CG:AT:G:CG:GA:G:G:C:CT:CT:GA:GT:GA:CG:G:G:G:C:C:A:NN:T:T:TC:G:CT:CA:T:C:G:C:AG:T:GA:A:G:NN:C:G:G:C:TA:G:TC:AG:NN:T:NN
    SD0097a_SD8r9345842_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:C:CT:TC:GA:C:C:T:C:G:C:AG:C:NN:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC
    SD0097b_SD8r9345841_07-APR-13:CT:C:AG:G:CA:GA:G:A:G:C:T:GT:AG:G:G:C:T:TC:GA:GA:C:T:A:C:CA:G:T:T:T:C:A:C:T:G:TC:G:CT:GT:TC:T:C:A:CT:T:GA:C:G:CA:C:CG:AT:A:G:GA:AG:AG:T:CT:CT:GA:GT:GA:C:A:G:G:TC:T:G:NN:NN:CT:TC:GA:C:C:T:C:G:C:AG:C:GA:A:NN:NN:CT:T:GA:C:A:AG:TC:AG:NN:GT:GC
    );
    }

所以根据这个例子,如果样本SD0098 [a-c]与任何超过90%的SD0097 [a-b]匹配,我想给出结果百分比一些警告颜色。感谢您的任何提示和建议。

输出将以表格格式显示。它是一个网页输出。(只是模型输出而不是确切的输出):

Sample SD0098a SD0098b SD0098c SD0097a SD0097b
SD0098a - 98% 99% 97% 97%
SD0098b 99% - 95% 97% 99%
SD0098c 97% 97% - 100% 100%
SD0097a 97% 97% 100% - 100%
SD0097b 97% 99% 100% 100% -

1 个答案:

答案 0 :(得分:0)

我认为String :: Similarity可以帮助你。如果我理解正确,您需要进行近似字符串匹配