从csv文件中的数据创建多个csv文件

时间:2010-04-12 11:12:58

标签: python perl bash sed awk

系统OSX或Linux

我正在尝试自动化我的工作流程,每周我收到一个excel文件,我将其转换为csv。

一个例子是:

,,L1,,,L2,,,L3,,,L4,,,L5,,,L6,,,L7,,,L8,,,L9,,,L10,,,L11,
Title,r/t,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,neede d,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst,needed,actual,Inst
EXAMPLEfoo,60,6,6,6,0,0,0,0,0,0,6,6,6,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLEbar,30,6,6,12,6,7,14,6,6,12,6,6,12,6,8,16,6,7,14,6,7.5,15,6,6,12,6,8,16,6,0,0,6,7,14
EXAMPLE1,60,3,3,3,3,5,5,3,4,4,3,3,3,3,6,6,3,4,4,3,3,3,3,4,4,3,8,8,3,0,0,3,4,4
EXAMPLE2,120,6,6,3,0,0,0,6,8,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
EXAMPLE3,60,6,6,6,6,8,8,6,6,6,6,6,6,0,0,0,0,0,0,6,8,8,6,6,6,0,0,0,0,0,0,0,10,10
EXAMPLE4,30,6,6,12,6,7,14,6,6,12,6,6,12,3,5.5,11,6,7.5,15,6,6,12,6,0,0,6,9,18,6,0,0,6,6.5,13

因此,您可以了解它在excel中的外观: alt text http://i42.tinypic.com/2dt2glt.png

我需要做的是为第1行中的每个实例创建多个csv文件,因此L1,L2,L3,L4 ......

在每个csv文件中,它需要包含标题,r / t,

因此,对于L1,一个例子输出看起来像:

EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

对于L2:

EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

等等。

我尝试过使用sed和awk并点击谷歌,但我发现没有什么可以解决这个问题。

我认为perl特别适合这个或者python,所以我非常乐意接受用户的建议。

那么,有什么建议吗?

提前致谢。

6 个答案:

答案 0 :(得分:2)

Perl“one-liner”

perl -MText::CSV_XS -e'$c=Text::CSV_XS->new({binary=>1,eol=>"\n"});%a=map{$i++;/^L\d+$/?($_=>$i):()}@{$c->getline(*ARGV)};open$b{$_},">$_"for keys%a;while($f=$c->getline(*ARGV)){$c->print($b{$_},[@$f[0,1,$a{$_}]])for keys%a}'

对于有阅读问题的人:

$ echo '$c=Te...' | perltidy
$c = Text::CSV_XS->new( { binary => 1, eol => "\n" } );
%a = map { $i++; /^L\d+$/ ? ( $_ => $i ) : () } @{ $c->getline(*ARGV) };
open $b{$_}, ">$_" for keys %a;
while ( $f = $c->getline(*ARGV) ) {
    $c->print( $b{$_}, [ @$f[ 0, 1, $a{$_} ] ] )
      for keys %a;
}

答案 1 :(得分:2)

仅使用AWK:

awk -F, -vOFS=, -vc=1 '
    NR == 1 {
        for (i=1; i<NF; i++) {
            if ($i != "") {
                g[c]=i;
                f[c++]=$i
            }
        }
    }
    NR>2 {
        for (i=1; i < c; i++) {
            print $1,$2, $g[i] > "output_"f[i]".csv"
        }
    }' data.csv

作为一个单行:

awk -F, -vOFS=, -vc=1 'NR == 1 {for (i=1; i<NF; i++) {if ($i != "") {g[c]=i; f[c++]=$i}}} NR>2 { for (i=1; i < c; i++) {print $1,$2, $g[i] > "file_"f[i]".csv" }}' data.csv

示例输出:

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6
$ cat file_L11.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,0
EXAMPLE4,30,6

答案 2 :(得分:1)

试试这个

#!/bin/bash
awk 'BEGIN{ OFS=FS="," }
NR==1{
 for(i=1;i<=NF;i++){
   if($i){ f[i]=$i }
 }
}
NR>2{ for(o in f){ print $1,$2, $o > "file_"f[o]".csv" } } ' file

输出

$ cat file_L1.csv
EXAMPLEfoo,60,6
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,6
EXAMPLE3,60,6
EXAMPLE4,30,6

$ cat file_L2.csv
EXAMPLEfoo,60,0
EXAMPLEbar,30,6
EXAMPLE1,60,3
EXAMPLE2,120,0
EXAMPLE3,60,6
EXAMPLE4,30,6

答案 3 :(得分:1)

use strict;
use warnings;

use Text::CSV;
my $csv = Text::CSV->new;

sub parse_line {
    $csv->parse(shift) or die $!;
    return $csv->fields;
}

my @metadata;
my @files  = parse_line(scalar <>);
my @header = parse_line(scalar <>); # Ignore.
for my $i (0 .. $#files){
    next unless length $files[$i];
    open(my $h, '>', "$files[$i].csv") or die $!;
    push @metadata, {column => $i, handle => $h};
}

while (my $line = <>){
    my @fields = parse_line($line);
    for my $m (@metadata){
        $csv->print($m->{handle}, [ @fields[0, 1, $m->{column}] ]);
        print {$m->{handle}} "\n";
    }
}

答案 4 :(得分:0)

在Python中,有点hacky和未经测试,但应该做的工作:

import csv
r = csv.reader(open(r'file.csv'), dialect='excel')
topline = r.next()
headerline = r.next()

lastcell = ''
for i, cell in enumerate(topline): #Copy cells forwards in the top line, so L1 for example goes across all cells
    if cell == '':
        topline[i] = lastcell
    else:
        lastcell = cell

for i in range(len(headerline)): #Copy the topline cells into the header line, so the headerline cells should be unique
    headerline[i] = '-'.join((topline[i], headerline[i]))

rows = [dict(zip(headerline, line)) for line in r]

# Rows should now consist of dicts of the form {'Title': 'EXAMPLEfoo', 'r/t': '60', 'L1-needed': '6' ...}

for lval in frozenset(topline): #Use frozenset to ensure we only have unique values.
    if lval != '': #Make sure we don't look at the blank value
        w = csv.writer(open(r'%s.csv' % lval, 'w'), dialect='excel')
        for row in rows:
            line = [row['Title'], row['r/t'], row['-'.join((lval, 'needed'))]]
            w.writerow(line)

答案 5 :(得分:0)

查看perl模块Text::CSV_XS - 逗号分隔值操作例程。我发现这个模块在使用CSV文件进行操作时非常有用。