perl从指定的行到结尾读取文件

时间:2017-04-17 02:23:24

标签: perl

我是perl的新手。我正在尝试读取一个大的逗号单独文件,拆分并只抓取一些列。我可以通过一些互联网帮助创建它,但我很难改变代码,从文件末尾的特定行开始阅读。 我需要的是第12行的开放文件开始读取,拆分','抓取列0,2,10,11,并用'\ t'连接所需的列。

这是我的代码

#!/usr/bin/perl
my $filename = 'file_to_read.csv';
open(FILER, $filename) or die "Could not read $filename.";
open(FILEW, ">$filename.txt")     || die "couldn't create the file\n";
while(<FILER>) {
  chomp;
  my @fields = split(',', $_);
  print FILEW "$fields[0]\t$fields[3]\t$fields[10]\t$fields[11]\n";
}
close FILER;
close FILEW;

这是文件示例:

[Header]
GSGT Version: X
Processing Date:12/01/2010 7:20 PM
Content:
Num SNPs:
Total SNPs:
Num Samples:
Total Samples:
Sample:
[Data]

SNP Name,Chromosome,Pos,GC Score,Theta,R,X,Y,X Raw,Y Raw,B Allele Freq,Log R Ratio,Allele1 - TOP,Allele2 - TOP
1:10001102-G-T,1,10001102,0.4159,0.007,0.477,0.472,0.005,6281,126,0.0000,-0.2581,A,A
1:100011159-T-G,1,100011159,0.4259,0.972,0.859,0.036,0.822,807,3648,0.9942,-0.0304,C,C
1:10002775-GA,1,10002775,0.4234,0.977,1.271,0.043,1.228,809,5140,0.9892,0.0111,G,G

2 个答案:

答案 0 :(得分:0)

除了跳过特定的行号(可能因文件而异)之外,最好跟踪标记为[Header][Data]等的文件的当前部分。

此解决方案保留状态变量$section,每次在文件中遇到[Section]标签时,该变量将更新为当前节名称。 Data部分中的所有内容都已汇总并打印

使用列标题可以完成类似的操作,使用名称而不是数字来选择要输出的字段,但我选择保持复杂性

use strict;
use warnings 'all';
use feature 'say';

my $filename = 'file_to_read.csv';

open my $fh, '<', $filename or die qq{Unable to open "$filename" for input: $!};

my $section = "";

while ( <$fh> ) {

    next unless /\S/;            # Skip empty lines

    if ( $section eq 'Data' ) {  # Skip unless we're in the [Data] section
        chomp;
        my @fields = split /,/;
        say join ',', @fields[0,3,10,11];
    }
    elsif ( /\[(\w+)\]/ ) {
        $section = $1;
    }
}

输出

SNP Name,GC Score,B Allele Freq,Log R Ratio
1:10001102-G-T,0.4159,0.0000,-0.2581
1:100011159-T-G,0.4259,0.9942,-0.0304
1:10002775-GA,0.4234,0.9892,0.0111

答案 1 :(得分:-1)

请指定一个变量来计算处理过的行,如my $line_count = 0;

并在while循环的开头内增加变量$line_count++;

并在行数低于12时跳过,即next if $line_count > 12;