Question

我的第一个文件如下：

CHR id position                                                                                                                                                                
1 rs58108140 10583                                                                                                                                                             
1 rs189107123 10611                                                                                                                                                            
1 rs180734498 13302                                                                                                                                                            
1 rs144762171 13327                                                                                                                                                            
1 chr1:13957:D 13957

我的第二个文件看起来像是：

CHR SNP POS RiskAl OTHER_ALLELE RAF logOR Pval                                                                                                                                 
10 rs1999138 110140096 T C 0.449034245446375 0.0924443 1.09e-06                                                                                                                
6 rs7741604 20839503 C A 0.138318264238111 0.127947 1.1e-06                                                                                                                    
8 rs1486006 82553172 G C 0.833130882716561 0.147456 1.12727730194884e-06

我的脚本读入第一个文件并将其存储在一个数组中，然后我想在第二个文件的第2列中找到第一个文件的第2列的rsID。我想我对如何匹配表达式有疑问。这是我的剧本：

#! perl -w                                                                                                                                                                      
use strict;
use warnings;

my $F = shift @ARGV;
my @snps;
open IN, "$F";
while (<IN>) {
  next if m/CHR/;
  my @L = split;
  push @snps, [$L[0], $L[1], $L[2]] if $L[0] !~ m/[XY]/;
}
close IN;

open IN, "DIAGRAMv3sansWTCCCqc0clumpd_noTCF7L2regOrLeadOrPlt1em6clumps-     CHR_SNP_POS_RiskAl_OtherAl_RAF_logOR_Pval.txt";
while (<IN>) {
  my @L = split;
  next if m/CHR/;

  foreach (@snps) {
    next if ($L[0] != ${$_}[0]);

    # if not on same chromosome
    if ($L[0] = ${$_}[0]) {

      # if on same chromosome
      if ($L[1] =~ ${$_}[1]) {
        print "$L[0] $L[1] ${$_}[2]\n";
        last;
      }
    }
  }
}

Answer 1

您的代码似乎与您的说明不符。您正在比较文件的第一列和第二列，而不仅仅是第二列。

主要问题是：

您使用$L[0] = ${$_}[0]来比较第一列。这将执行 assigmment 而不是比较。您应该使用$L[0] == ${$_}[0]代替，或者更好地使用$L[0] == $_->[0]
您使用$L[1] =~ ${$_}[1]来比较第二列。这将检查${$_}[1]是$L[1]的子字符串。您可以使用$L[1] =~ /^${$_}[1]$/之类的锚点，但只需进行字符串比较$L[1] eq $_->[1]

最简单的方法是首先读取第二个文件，以便构建您希望从第一个文件中包含的值列表。我编写了它，以便它可以执行代码看起来应该执行的操作，即匹配第一个两个列。

看起来像这样

use strict;
use warnings;
use autodie;

my ($f1, $f2) = @_;

my %include;
open my $fh2, '<', $f2;
while (<$fh2>) {
  my @fields = split;
  my $key = join '|', @fields[0,1];
  ++$include{$key};
}
close $fh2;

open my $fh1, '<', $f1;
while (<$fh1>) {
  my @fields = split;
  my $key = join '|', @fields[0,1];
  print "@fields[0,1,2]\n" if $include{$key};
}
close $fh1;

<强>输出

不幸的是，您选择的样本数据不包括第一个文件中任何记录，第二个文件中有匹配的键，因此没有输出！

<强>更新

这是您自己程序的更正版本。它应该可以工作，但使用哈希更加高效和简洁，如上所述

use strict;
use warnings;
use autodie;

my ($filename) = @ARGV;
my @snps;
open my $in_fh, '<', $filename;
<$in_fh>; # Discard header line
while (<$in_fh>) {
  my @fields = split;
  push @snps, \@fields unless $fields[0] =~ /[XY]/;
}
close $in_fh;

open $in_fh, '<', 'DIAGRAMv3sansWTCCCqc0clumpd_noTCF7L2regOrLeadOrPlt1em6clumps-     CHR_SNP_POS_RiskAl_OtherAl_RAF_logOR_Pval.txt';
<$in_fh>; # Discard header line
while (<$in_fh>) {
  my @fields = split;
  for my $snp (@snps) {
    next unless $fields[0] == $snp->[0] and $fields[1] eq $snp->[1];
    print "$fields[0] $fields[1] $snp->[2]\n";
    last;
  }
}
close $in_fh;

根据第二个文件中的键从文件中选择记录

1 个答案: