我有一个像这样的csv文件:
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 67,Reading Comprehension 59,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 44,Reading Comprehension 40
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 41,Sentence Skills 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 104,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 85
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Elementary Algebra 23,Arithmetic 42,Sentence Skills 75
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Sentence Skills 96,Reading Comprehension 88
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Elementary Algebra 53,Sentence Skills 97
前5列始终相同,后5列总是以不同的顺序排列。我需要保持前5列相同并重新排序最后5列,始终按以下顺序阅读理解,句子技巧,算术,大学水平数学,初等代数
如果其中一个字符串不存在,请添加逗号
所以最终结果如下:
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 59,Sentence Skills 67,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 40,Sentence Skills 44,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 39,,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 82,,,Elementary Algebra 41
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 104,,,Elementary Algebra 82
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 85,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,,,,Elementary Algebra 51
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 71,Sentence Skills 54,,,Elementary Algebra 33
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 70,Sentence Skills 75,Arithmetic 42,,Elementary Algebra 23
Last,First,A00XXXXXX,1492-12-03,2015-06-23,Reading Comprehension 88,Sentence Skills 96,,,
Last,First,A00XXXXXX,1492-12-03,2015-06-23,,Sentence Skills 97,,,Elementary Algebra 53
如果他们总是按照相同的顺序我可以这样做:
awk -F, -v OFS=, '!/Reading Comprehension/ { $5 = $5 "," } 1'
如果他们总是至少在同一列中,我可以做一个
awk {print $1,$2,$3,$4,$5,$7,$8,$6,$9,$10}
但是每一行都有不同的顺序,并且在末尾有一个数字变量,因为它引发了我的循环。
我想用AWK做这件事,但我现在对任何事情持开放态度。
从逻辑上讲,我认为我需要做类似的事情:j =阅读*,i =句子*,k =算术*,l =大学*,m =小学*然后awk {打印$ 6j,$ 7i,$ 8k,$ 9l,$ 10m}
但我的谷歌搜索已经返回了犯罪结果。所以,即使评论是在这里查找或寻找这个或查看这个答案...将非常感谢
注意:我尽力确保输入和输出正确。我已经发布了另一个与此类似的问题,但那时列始终处于相同的顺序。所以这是一个不同的要求。
答案 0 :(得分:5)
这是一个用python编写的简单干净的解决方案。您必须将input.csv
和output.csv
替换为CSV文件。
import csv
labels = [
"Reading Comprehension", "Sentence Skills", "Arithmetic",
"College Level Math", "Elementary Algebra"
]
with open('output.csv', 'wb') as outfile, \
open('input.csv', 'rb') as infile:
writer = csv.writer(outfile)
reader = csv.reader(infile)
for row in reader:
head = row[:5]
tail = []
for label in labels:
tail.append(next((i for i in row[5:] if i.startswith(label)), ""))
writer.writerow(head + tail)
这是另一个更短的解决方案,它使用管道:
#!/usr/bin/python
from sys import stdin, stdout
labels = [
"Reading Comprehension", "Sentence Skills", "Arithmetic",
"College Level Math", "Elementary Algebra"
]
for line in stdin:
values = line.strip().split(',')
stdout.write(','.join(values[:5]))
for label in labels:
stdout.write(',')
stdout.write(next((i for i in values[5:] if i.startswith(label)), ''))
stdout.write('\n')
stdout.flush()
如果将此代码保存在文件中,例如名为reorder
,并使此文件可执行,则可以重新格式化CSV文件,如下所示:
$ cat input.csv | ./reorder
然后将重新格式化的csv内容写入标准输出。
答案 1 :(得分:3)
看起来你自己回答了,但是因为我已经把这一切都写完了(因为它并不要求第一个单词像awk解决方案一样独特,所以没有类别是任何其他类别的子串):
在perl中,这可以通过以下方式解决。
use strict;
use warnings;
my @categories = ('Reading Comprehension', 'Sentence Skills', 'Arithmetic', 'College Level Math', 'Elementary Algebra');
while(<ARGV>) {
chomp;
my @columns = split(/,/);
print join(',', @columns[0 .. 4], map { my $c = $_; (grep { /$c/ } @columns)[0] || '' } @categories)."\n";
}
如果没有提供参数,这可以接受文件名作为输入或标准输入。
连接线的解释是您需要前5列,后面是匹配给定类别的第一列,如果没有列匹配,则为空字符串。
map { my $c = $_; ... } @categories
:为每个类别执行此操作($ c代表类别而不是$ _)
grep { /$c/ } @columns
:与给定类别匹配的所有列
(...)[0] || ''
:匹配的第一件事或空字符串
作为单行,这可以表示如下:
perl -nalF, -e 'print join(",", @F[0 .. 4], map { my $c = $_; (grep { /$c/ } @F)[0] || "" } ("Reading Comprehension", "Sentence Skills", "Arithmetic", "College Level Math", "Elementary Algebra"));' inputfile.txt
-n
:在提供的代码周围隐式放置WHILE(<ARGV>){}
块
-a
:自动拆分行并将结果放入@F
-l
:自动从输入中删除换行符并将其添加到输出中
-F,
:用逗号分隔而不是默认的空格。
答案 2 :(得分:2)
另一种perl解决方案。
#!/usr/bin/env perl
use warnings;
use strict;
my @column_order = (
'Reading Comprehension',
'Sentence Skills',
'Arithmetic',
'College Level Math',
'Elementary Algebra',
);
my $csv = 'foo.csv'; # CHANGME
# Open the File
open my $fh, $csv
or die "Unable to open $csv : $!";
# Read through the file, line-by-line
while (<$fh>) {
my @columns = split /,/; # Split each line by ','
my $first_five = join ',', splice @columns, 0, 5; # Remove the first 5 columns
my %data = map { $_ => '' } @column_order; # default to empty for each column
# iterate over remaing columns
for my $col (@columns) {
# if we match any of our desired columns
if (my ($match) = grep { $col =~ m|^$_| } @column_order) {
$col =~ s|\s*$||; # delete any trailing whitespace
$data{$match} = $col; # store it in a hash
}
}
my $remaining_columns = join ',', @data{@column_order}; # join the hash values
print $first_five . ',', $remaining_columns . "\n";
}
答案 3 :(得分:1)
@Glenn Jackson在此发布的代码:Creating an AWK For Loop out of piped commands
,如下:
awk -F, -v OFS=, '
{
delete val # clear the previous values if any
for (i=6; i<=NF; i++) {
split($i, a, " ")
val[a[1]] = $i # a[1] is the first space-separated word
}
print $1,$2,$3,$4,$5, val["Reading"], # null values are OK
val["Sentence"],
val["Arithmetic"],
val["College"],
val["Elementary"]
}
' input
完全符合我的需要,并且工作得很完美,并且有足够的意义我可以适应它。