我一直在寻找一些使用Perl转置大型csv文件的方法,但无法在我的循环中正确处理。
有一个包含7列的标题行(实际上我有200多个列)。前三列是固定的,后续列是数字。如果帐户金额为0,请跳过并且不进行转置。
源数据:
Name,Age,Gender,Acct1,Acct2,Acct3,Acct4
Jack,12,M,10,20,0,999
Mary,20,F,40,50,0,111
转置数据:
Column_ID,Name,Age,Gender,Acct
4,Jack,12,M,10
5,Jack,12,M,20
7,Jack,12,M,999
4,Mary,20,F,40
5,Mary,20,F,50
7,Mary,20,F,111
答案 0 :(得分:1)
我猜这个源数据在文件中,而不是手动解析为perl分配。
#!/usr/bin/perl
use strict;
use warnings;
print "Column_ID,Name,Age,Gender,Acct\n";
foreach my $file (@ARGV) {
open my $FH, '<', $file
or warn("Couldn't open $file: $!\n"), next;
while (<$FH>) {
chomp;
my @cols = split /\,/;
my @retained = @rows[0 .. 2];
foreach my $col (3 .. $#cols) {
print join(',', 1 + $col, @retained, $cols[$col]) . "\n"
if $cols[$col];
}
}
}
答案 1 :(得分:1)
使用Perl单线版
f <- function(object, fill = 0, maxgap = Inf, ...) {
rr <- rle(is.na(object))
ii <- rep(rr$values == FALSE | rr$lengths > maxgap, rr$lengths)
na.fill(object, fill, ix = ii)
}
f(x, 0, maxgap = 3)
## 2019-01-17 2019-01-18 2019-01-19 2019-01-20 2019-01-21 2019-01-22 2019-01-23
## 1 0 0 0 5 6 7
## 2019-01-24 2019-01-25 2019-01-26 2019-01-27 2019-01-28 2019-01-29 2019-01-30
## 8 9 NA NA NA NA NA
## 2019-01-31 2019-02-01 2019-02-02 2019-02-03 2019-02-04 2019-02-05
## NA NA 17 18 19 20
答案 2 :(得分:0)
假设您已将CSV读入带有CSV模块之一的数组的数组中(请不要自己解析CSV),我将这样进行:
#!/usr/bin/perl
use strict;
use warnings;
my @rows = (
['Jack',12,'M',10,20,0,999],
['Mary',20,'F',40,50,0,111],
);
my @output;
foreach my $row (@rows) {
foreach my $col (3..$#{$row}) {
if ($row->[$col] != 0) {
push(@output, [$col + 1, @{$row}[0,1,2,$col]]);
}
}
}
foreach my $row (@output) {
print join(',', @{$row}), "\n";
}
示例输出:
$ perl dummy.pl
4,Jack,12,M,10
5,Jack,12,M,20
7,Jack,12,M,999
4,Mary,20,F,40
5,Mary,20,F,50
7,Mary,20,F,111