Question

我有两个文件：

1）具有以下内容的选项卡文件。我们称这个参考文件为：

V$HMGIY_01_rc   Ncor=0.405
V$CACD_01   Ncor=0.405
V$GKLF_02   Ncor=0.650
V$AML2_Q3   Ncor=0.792
V$WT1_Q6    Ncor=0.607
V$KID3_01   Ncor=0.668
V$CNOT3_01  Ncor=0.491
V$KROX_Q6   Ncor=0.423
V$ETF_Q6_rc Ncor=0.547
V$E2F_Q2_rc Ncor=0.653
V$SP1_Q6_01_rc  Ncor=0.650
V$SP4_Q5    Ncor=0.660

2）第二个标签文件包含搜索字符串X，如下所示。我们将此文件称为search_string：

       A                 X
    NF-E2_SC-22827    NF-E2
    NRSF              NRSF
    NFATC1_SC-17834   NFATC1
    NFKB              NFKB
    TCF3_SC-349       TCF3
    MEF2A             MEF2A

我已经做的是：取第一个搜索词（来自search_string文件;列X），检查它是否出现在参考文件的第一列。示例：第一个搜索词是NF-E2。我检查了这个字符串是否出现在参考文件的第一列中。如果它发生，那么得分为1，否则给0.我也计算了它与模式匹配的次数。现在我的输出格式为：

     Keyword     Keyword in file?     Number of times keyword occurs in file
      NF-E2          1                            3
      NRSF           0                            0
      NFATC1         0                            0
      NFKB           1                            7
      TCF3           0                            0

现在，除此之外，我想添加的是每个文件中每个字符串的最高Ncor值。比如说：当我在NF-E2.txt中搜索NF-E2时，存在的Ncor值是：3.02,2.87和4.59。然后我希望在下一列中打印值4.59。所以现在我的输出应该是这样的：

  Keyword    Keyword in file?   Number of times keyword occurs in file  Ncor
  NF-E2          1                            3                         4.59
  NRSF           0                            0
  NFATC1         0                            0
  NFKB           1                            7                         1.66
  TCF3           0                            0

请注意：我需要在不同的文件中搜索每个字符串，即第一个字符串（Nf-E2）应该在文件NF-E2.tab中搜索;应在文件NRSF.tab中搜索第二个字符串（NRSF），依此类推。

这是我的代码：

perl -lanE '$str=$F[1];  $f="/home/$str/list/$str.txt"; $c=`grep -c "$str" "$f"`;chomp($c);$x=0;$x++ if $c;say "$str\t$x\t$c"' file2

请帮助!!!

Answer 1

这应该有效：

#!/usr/bin/perl

use strict;
use warnings;

while (<>) {
    chomp;
    my $keyword = (split /\s+/)[1];
    my $file = "/home/$keyword/list/${keyword}.txt";
    open my $reference, '<', "$file" or die "Cannot open $file: $!";

    my $key_cnt = 0;
    my $max_ncor = 0;
    while (my $line = <$reference>) {
        my ($string, undef, $ncor) = split /\s+|=/, $line;
        if ($string =~ $keyword) {
            $key_cnt++;
            $max_ncor = $ncor if ($max_ncor < $ncor);
        }
    }
    print join("\t", $keyword, $key_cnt ? 1 : 0, $key_cnt, $key_cnt ? $max_ncor : ''), "\n";
}

像这样运行：

perl t.pl search_string.txt

在其出现次数中插入最高值

1 个答案: