我的数据如下:
-1 1:-0.394668 2:-0.794872 3:-1 4:-0.871341 5:0.9365 6:0.75597
1 1:-0.463641 2:-0.897436 3:-1 4:-0.871341 5:0.44378 6:0.121824
1 1:-0.469432 2:-0.897436 3:-1 4:-0.871341 5:0.32668 6:0.302529
-1 1:-0.241547 2:-0.538462 3:-1 4:-0.871341 5:0.9994 6:0.987166
1 1:-0.757233 2:-0.948718 3:-1 4:-0.871341 5:-0.33904 6:0.915401
1 1:-0.167147 2:-0.589744 3:-1 4:-0.871341 5:0.95078 6:0.991566
第一列是类,接下来的6列是功能。我想创建6个文件 个人特征。例如
my_input_feat1.txt 将包含
-1 1:-0.394668
1 1:-0.463641
...
1 1:-0.757233
1 1:-0.167147
my_input_feat2.txt 将包含
-1 2:-0.794872
...
1 2:-0.589744
等等。我有一个Perl代码可以做到这一点,但它非常慢。在那儿 一种更快的方法吗?通常,输入文件将包含100K行。
use strict;
use Data::Dumper;
use Carp;
my $input = $ARGV[0] || "myinput.txt";
my $INFILE_file_name = $input; # input file name
open ( INFILE, '<', $INFILE_file_name )
or croak "$0 : failed to open input file $INFILE_file_name : $!\n";
my $out1 = $input."_feat_1.txt";
my $out2 = $input."_feat_2.txt";
my $out3 = $input."_feat_3.txt";
my $out4 = $input."_feat_4.txt";
my $out5 = $input."_feat_5.txt";
my $out6 = $input."_feat_6.txt";
unlink($out1);
unlink($out2);
unlink($out3);
unlink($out4);
unlink($out5);
unlink($out6);
print "$out1\n";
while ( <INFILE> ) {
chomp;
my @els = split(/\s+/,$_);
my $lbl = $els[0];
my $OUTFILE1_file_name = $out1; # output file name
open ( OUTFILE1, '>>', $OUTFILE1_file_name )
or croak "$0 : failed to open output file $OUTFILE1_file_name : $!\n";
print OUTFILE1 "$lbl $els[1]\n";
close ( OUTFILE1 ); # close output file
my $OUTFILE2_file_name = $out2; # output file name
open ( OUTFILE2, '>>', $OUTFILE2_file_name )
or croak "$0 : failed to open output file $OUTFILE2_file_name : $!\n";
print OUTFILE2 "$lbl $els[2]\n";
close ( OUTFILE2 ); # close output file
# Etc.. until OUTFILE 6
}
close (INFILE);
答案 0 :(得分:3)
您应该在while循环之外移动打开/关闭输出文件。
答案 1 :(得分:2)
shell脚本是否正常?
awk '{print $1" "$2}' data.txt > feat1_file.txt
awk '{print $1" "$3}' data.txt > feat2_file.txt
awk '{print $1" "$4}' data.txt > feat3_file.txt
awk '{print $1" "$5}' data.txt > feat4_file.txt
awk '{print $1" "$6}' data.txt > feat5_file.txt
awk '{print $1" "$7}' data.txt > feat6_file.txt
答案 2 :(得分:2)
#!/usr/bin/sh
for i in `seq 1 $1`; do
cut -f1,$i $2 > ${2}_$i;
done
或
#!/usr/bin/perl
use warnings; use strict;
my $input_file = $ARGV[0];
my %handles;
while (<>) {
my ($class, @features) = split /\s+/;
for my $i (1 .. @features) {
open $handles{$i}, '>', $input_file . "_$i" or die $!
unless exists $handles{$i};
print {$handles{$i}} join( ' ', $class, $features[$i - 1] ), "\n";
}
}
while (my (undef, $handle) = each %handles) {
close $handle or die $!;
}