Question

我正在解析一个CSV文件，其中每行看起来如下所示。

10998,4499，SLC27A5，Q9Y2P5，GO：0000166，GO：0032403，GO：0005524，GO：0016874，GO：0047747，GO：0004467，GO：0015245 ,,,,,,,,,,,, ,,,,,,,,,,,,,,,,,,,,,,,,,

每行末尾似乎都有尾随逗号。

我想获得第一个学期，在本例中为“10998”并获得与其相关的GO术语数。所以我在这种情况下的输出应该是，

输出：

10998,7

但相反它显示299.我意识到整体上每行有303个逗号。而且我无法找到一种简单的删除尾随逗号的方法。任何人都可以帮我解决这个问题吗？

谢谢！

我的代码：

use strict;
use warnings;

open my $IN, '<', 'test.csv' or die "can't find file: $!";
open(CSV, ">GO_MF_counts_Genes.csv") or die "Error!! Cannot create the file: $!\n";
my @genes = ();

my $mf;
foreach my $line (<$IN>) {
    chomp $line;
    my @array = split(/,/, $line);
    my @GO = splice(@array, 4);
    my $GO = join(',', @GO);
    $mf = count($GO);
    print CSV "$array[0],$mf\n";
}

sub count {
    my $go = shift @_;
    my $count = my @go = split(/,/, $go);
    return $count;
}

Answer 1

我会使用juanrpozo的解决方案进行计数，但是如果你仍想按照自己的方式行事，那么请删除带有正则表达式替换的逗号。

$line =~ s/,+$//;

Answer 2

我建议使用这种更简洁的程序编码方式。

请注意，行my @data = split /,/, $line会丢弃尾随空字段（@data只有11个字段包含您的示例数据），因此无论事先是否删除了逗号，都会产生相同的结果。

use strict;
use warnings;

open my $in, '<', 'test.csv' or die "Cannot open file for input: $!";
open my $out, '>', 'GO_MF_counts_Genes.csv' or die "Cannot open file for output: $!";

foreach my $line (<$in>) {
  chomp $line;
  my @data = split /,/, $line;
  printf $out "%s,%d\n", $data[0], scalar grep /^GO:/, @data;
}

Answer 3

您可以将grep应用于@array

my $mf = grep { /^GO:/ } @array;

假设$array[0]永远不会匹配/^GO:/

Answer 4

对于你的每一行：

foreach my $line (<$IN>) {
    my ($first_term) = ($line =~ /(\d+),/);
    my @tmp = split('GO', " $line ");
    my $nr_of_GOs = @tmp - 1;
    print CSV "$first_term,$nr_of_GOs\n";
}

使用Perl删除字符串末尾的尾随逗号

输出：

我的代码：

4 个答案: