按字符串变量重新排序列

时间:2015-06-23 22:55:52

标签: python bash perl awk

我有一个像这样的csv文件:

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97

前5列始终相同,后5列总是以不同的顺序排列。我需要保持前5列相同并重新排序最后5列,始终按以下顺序阅读理解,句子技巧,算术,大学水平数学,初等代数

如果其中一个字符串不存在,请添加逗号

所以最终结果如下:

Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53

如果他们总是按照相同的顺序我可以这样做:

awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1'

如果他们总是至少在同一列中,我可以做一个

awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10}

但是每一行都有不同的顺序,并且在末尾有一个数字变量,因为它引发了我的循环。

我想用AWK做这件事,但我现在对任何事情持开放态度。

从逻辑上讲,我认为我需要做类似的事情:j =阅读*,i =句子*,k =算术*,l =大学*,m =小学*

然后awk {打印$ 6j,$ 7i,$ 8k,$ 9l,$ 10m}

但我的谷歌搜索已经返回了犯罪结果。所以,即使评论是在这里查找或寻找这个或查看这个答案...将非常感谢

注意:我尽力确保输入和输出正确。我已经发布了另一个与此类似的问题,但那时列始终处于相同的顺序。所以这是一个不同的要求。

4 个答案:

答案 0 :(得分:5)

这是一个用python编写的简单干净的解决方案。您必须将input.csvoutput.csv替换为CSV文件。

import csv 

labels = [
    "Reading Comprehension", "Sentence Skills", "Arithmetic",
    "College Level Math", "Elementary Algebra"
]

with open('output.csv', 'wb') as outfile, \
     open('input.csv', 'rb') as infile:
    writer = csv.writer(outfile)
    reader = csv.reader(infile) 

    for row in reader: 
        head = row[:5]
        tail = []
        for label in labels:
            tail.append(next((i for i in row[5:] if i.startswith(label)), ""))
        writer.writerow(head + tail)

这是另一个更短的解决方案,它使用管道:

#!/usr/bin/python    
from sys import stdin, stdout

labels = [
    "Reading Comprehension", "Sentence Skills", "Arithmetic",
    "College Level Math", "Elementary Algebra"
]

for line in stdin: 
    values = line.strip().split(',')
    stdout.write(','.join(values[:5]))
    for label in labels:
        stdout.write(',')
        stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))
    stdout.write('\n')
stdout.flush()

如果将此代码保存在文件中,例如名为reorder,并使此文件可执行,则可以重新格式化CSV文件,如下所示:

$ cat input.csv | ./reorder

然后将重新格式化的csv内容写入标准输出。

答案 1 :(得分:3)

看起来你自己回答了,但是因为我已经把这一切都写完了(因为它并不要求第一个单词像awk解决方案一样独特,所以没有类别是任何其他类别的子串):

在perl中,这可以通过以下方式解决。

use strict;
use warnings;

my @categories = ('Reading Comprehension', 'Sentence Skills', 'Arithmetic', 'College Level Math', 'Elementary Algebra');

while(<ARGV>) {
    chomp;
    my @columns = split(/,/);
    print join(',', @columns[0 .. 4], map { my $c = $_; (grep { /$c/ } @columns)[0] || '' } @categories)."\n";
}

如果没有提供参数,这可以接受文件名作为输入或标准输入。

连接线的解释是您需要前5列,后面是匹配给定类别的第一列,如果没有列匹配,则为空字符串。

map { my $c = $_; ... } @categories:为每个类别执行此操作($ c代表类别而不是$ _)
grep { /$c/ } @columns:与给定类别匹配的所有列
(...)[0] || '':匹配的第一件事或空字符串

作为单行,这可以表示如下:

perl -nalF, -e 'print join(",", @F[0 .. 4], map { my $c = $_; (grep { /$c/ } @F)[0] || "" } ("Reading Comprehension", "Sentence Skills", "Arithmetic", "College Level Math", "Elementary Algebra"));' inputfile.txt

-n:在提供的代码周围隐式放置WHILE(<ARGV>){}-a:自动拆分行并将结果放入@F
-l:自动从输入中删除换行符并将其添加到输出中 -F,:用逗号分隔而不是默认的空格。

答案 2 :(得分:2)

另一种perl解决方案。

#!/usr/bin/env perl

use warnings;
use strict;

my @column_order = (
   'Reading Comprehension',
   'Sentence Skills',
   'Arithmetic',
   'College Level Math',
   'Elementary Algebra',
);

my $csv = 'foo.csv'; # CHANGME

# Open the File
open my $fh, $csv
    or die "Unable to open $csv : $!";

# Read through the file, line-by-line
while (<$fh>) {
    my @columns = split /,/; # Split each line by ','
    my $first_five = join ',', splice @columns, 0, 5; # Remove the first 5 columns
    my %data = map { $_ => '' } @column_order;  # default to empty for each column

    # iterate over remaing columns
    for my $col (@columns) {
        # if we match any of our desired columns
        if (my ($match) = grep { $col =~ m|^$_| } @column_order) {
            $col =~ s|\s*$||; # delete any trailing whitespace
            $data{$match} = $col; # store it in a hash
        }
    }
    my $remaining_columns = join ',', @data{@column_order}; # join the hash values
    print $first_five . ',', $remaining_columns . "\n";
}

答案 3 :(得分:1)

@Glenn Jackson在此发布的代码:Creating an AWK For Loop out of piped commands

,如下:

awk -F, -v OFS=, '
{
    delete val                 # clear the previous values if any
    for (i=6; i<=NF; i++) {
        split($i, a, " ")
        val[a[1]] = $i         # a[1] is the first space-separated word
    }
    print $1,$2,$3,$4,$5, val["Reading"],    # null values are OK
                          val["Sentence"], 
                          val["Arithmetic"], 
                          val["College"], 
                          val["Elementary"]
}
' input

完全符合我的需要,并且工作得很完美,并且有足够的意义我可以适应它。