我是perl的新手。我正在尝试读取一个大的逗号单独文件,拆分并只抓取一些列。我可以通过一些互联网帮助创建它,但我很难改变代码,从文件末尾的特定行开始阅读。 我需要的是第12行的开放文件开始读取,拆分','抓取列0,2,10,11,并用'\ t'连接所需的列。
这是我的代码
#!/usr/bin/perl
my $filename = 'file_to_read.csv';
open(FILER, $filename) or die "Could not read $filename.";
open(FILEW, ">$filename.txt") || die "couldn't create the file\n";
while(<FILER>) {
chomp;
my @fields = split(',', $_);
print FILEW "$fields[0]\t$fields[3]\t$fields[10]\t$fields[11]\n";
}
close FILER;
close FILEW;
这是文件示例:
[Header]
GSGT Version: X
Processing Date:12/01/2010 7:20 PM
Content:
Num SNPs:
Total SNPs:
Num Samples:
Total Samples:
Sample:
[Data]
SNP Name,Chromosome,Pos,GC Score,Theta,R,X,Y,X Raw,Y Raw,B Allele Freq,Log R Ratio,Allele1 - TOP,Allele2 - TOP
1:10001102-G-T,1,10001102,0.4159,0.007,0.477,0.472,0.005,6281,126,0.0000,-0.2581,A,A
1:100011159-T-G,1,100011159,0.4259,0.972,0.859,0.036,0.822,807,3648,0.9942,-0.0304,C,C
1:10002775-GA,1,10002775,0.4234,0.977,1.271,0.043,1.228,809,5140,0.9892,0.0111,G,G
答案 0 :(得分:0)
除了跳过特定的行号(可能因文件而异)之外,最好跟踪标记为[Header]
,[Data]
等的文件的当前部分。
此解决方案保留状态变量$section
,每次在文件中遇到[Section]
标签时,该变量将更新为当前节名称。 Data
部分中的所有内容都已汇总并打印
使用列标题可以完成类似的操作,使用名称而不是数字来选择要输出的字段,但我选择保持复杂性
use strict;
use warnings 'all';
use feature 'say';
my $filename = 'file_to_read.csv';
open my $fh, '<', $filename or die qq{Unable to open "$filename" for input: $!};
my $section = "";
while ( <$fh> ) {
next unless /\S/; # Skip empty lines
if ( $section eq 'Data' ) { # Skip unless we're in the [Data] section
chomp;
my @fields = split /,/;
say join ',', @fields[0,3,10,11];
}
elsif ( /\[(\w+)\]/ ) {
$section = $1;
}
}
SNP Name,GC Score,B Allele Freq,Log R Ratio
1:10001102-G-T,0.4159,0.0000,-0.2581
1:100011159-T-G,0.4259,0.9942,-0.0304
1:10002775-GA,0.4234,0.9892,0.0111
答案 1 :(得分:-1)
请指定一个变量来计算处理过的行,如my $line_count = 0;
并在while循环的开头内增加变量$line_count++;
并在行数低于12时跳过,即next if $line_count > 12;