我真的希望你能提供帮助。我对(g)awk完全不熟悉,过去两周我一直在和它斗争。
我的原始文件如下 - 有一个列有唯一的ID,另一个有唯一的名称。后续列是各种课程,每个字段包含(当不为空时)每个课程和每个学生的标记。因此,每个学生每门课程只有一个标记:
Id Name Course1 Course2 Course3 Course4 Course5
1 John 55
2 George 63
4 Alex 64
1 John 74
3 Emma 63
2 George 64
4 Alex 60
2 George 29
3 Emma 69
1 John 67
3 Emma 80
4 Alex 57
2 George 91
1 John 81
1 John 34
3 Emma 75
2 George 89
4 Alex 49
3 Emma 78
4 Alex 69
5 TERRY 67
6 HELEN 39
这就是我想要达到的目的 - 根据唯一ID转置数据,即标记,并将标记放在每个相应的课程下面,如下所示:
Id Name Course1 Course2 Course3 Course4 Course5
1 John 55 69 64 60 49
2 George 29 64 89 91 63
3 Emma 63 80 75 78 69
4 Alex 57 69 64 60 49
5 TERRY 67
6 HELLEN 39
这是我迄今为止所做的:
Id Name Course1 Course2 Course3 Course4 Course5
1 John 55
2 George 29
3 Emma 63
4 Alex 57
5 TERRY
6 HELLEN
1 John 69
2 George 64
3 Emma 80
4 Alex 69
5 TERRY 67
6 HELLEN
1 John 64
2 George 89
3 Emma 75
4 Alex 64
5 TERRY
6 HELLEN 39
...and so on
根据我在awk上已经知道的内容实现这一点真的有点棘手(请注意我对基于sed / perl e.t.c.的解决方案不感兴趣)。 如果要提供一些帮助(最好不是一个班轮),我可能会要求有点描述性,因为我对解决方案感兴趣,就像我在方法本身一样。
非常感谢任何帮助。
EDIT 这是我为达到最后阶段所写的代码(以及我遇到的问题)
#!/bin/bash
files3="*.csv"
for j in $files3
do
#echo "processing $j..."
fi13=$(awk -F" " '(NR==1){field13=$13;}{print field13}' ./work1/test1YA.csv)
fi14=$(awk -F" " '(NR==1){field14=$14;}{print field14}' ./work1/test1YA.csv)
fi15=$(awk -F" " '(NR==1){field15=$15;}{print field15}' ./work1/test1YA.csv)
fi16=$(awk -F" " '(NR==1){field16=$16;}{print field16}' ./work1/test1YA.csv)
# awk -F" " 'BEGIN{OFS=" ";RS="\n"}{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' "$j" >> ./work1/test2YA.csv
awk -F" " -v f13="$fi13" -v f14="$fi14" -v f15="$fi15" -v f16="$fi16" '{if($13==f13){$13=$6;$14=$15=$16=""}if($13==f14){$14=$6;$13=$15=$16=""}if($13==f15){$15=$6;$13=$14=$16=""}if($13==f16){$16=$6;$13=$14=$15=""}{print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16}}' "$j" >> ./work1/test2YA.csv
done;
awk -F" " 'BEGIN{print "ID","Title","FirstName","MiddleName","LastName","FinalMarks","Status","Username","Campus","Code","Programme","Year","course1","course2","course3","course4"}{print}' ./work1/test2YA.csv >> ./work1/test3YA.csv
答案 0 :(得分:1)
这是gnu awk的解决方案:
<强> course.awk 强>
BEGIN { # setup field width for constant field splitting
FIELDWIDTHS = "2 2 12 7 1 7 1 7 1 7 1 7"
# setup sort order (by id)
PROCINFO["sorted_in"] = "@ind_num_asc"
}
NR == 1 { # print header
print
next
}
{
# add ids to names
names[ $1 ] = $3
# store under id and course number the mark if it is present
for( c = 1; c <= 5; c++ ) {
field = 2+ (c*2)
if( $(field) !~ /^ *$/ ) {
marks[ $1, c ] = $(field)
}
}
}
END {
# output
for( id in names ) {
printf("%-4s%-12s%7s %7s %7s %7s %7s\n",id, names[ id ], marks[ id, 1], marks[ id, 2], marks[ id, 3], marks[ id, 4], marks[ id, 5])
}
}
像这样使用:awk -f course.awk your_file
。
输入不是制表符分隔,但具有固定的列宽,这有点不明显:
FIELDWIDTHS
和%Ns
,其中N是派生的 if( $(field) !~ /^ *$/ )
检查字段是否完全由空格组成。答案 1 :(得分:0)
这可能是awk中的近似值:
NR==1{
for(x=1;x<=NF;x++)
{
head=head $x"\t";
}
print head
}
NR>1{
for(i=3;i<=NF;i++)
{
students[$1"\t"$2]=students[$1"\t"$2] "\t"$i;
}
}
END{
for (stu in students)
{
print stu,students[stu];
}
}
Id Name Course1 Course2 Course3 Course4 Course5
5 TERRY 67
4 Alex 64 60 57 49 69
1 John 55 74 67 81 34
6 HELEN 39
3 Emma 63 69 80 75 78
2 George 63 64 29 91 89
答案 2 :(得分:0)
相同的想法,也许更简单
$ awk 'BEGIN{ FIELDWIDTHS="16 8 8 8 8 8"}
NR==1{print;next}
NR>1{keys[$1];
for(i=2;i<=6;i++)
{gsub(" ","",$i);
if($i) a[$1,i]=$i}}
END{for(k in keys)
{printf "%16s",k;
for(i=2;i<=6;i++) printf "%-8s",a[k,i];
print ""}}' file
Id Name Course1 Course2 Course3 Course4 Course5
3 Emma 63 80 75 78 69
4 Alex 57 69 64 60 49
6 HELEN 39
5 TERRY 67
1 John 55 67 81 74 34
2 George 29 64 89 91 63
您也可以通过管道到sort -n
... | sort -n
Id Name Course1 Course2 Course3 Course4 Course5
1 John 55 67 81 74 34
2 George 29 64 89 91 63
3 Emma 63 80 75 78 69
4 Alex 57 69 64 60 49
5 TERRY 67
6 HELEN 39
答案 3 :(得分:0)
使用GNU awk进行FIELDWIDTHS,2D数组和sorted_in:
$ cat tst.awk
NR==1 {
print
split($0,f,/\S+\s*/,s)
for (i=1;i in s;i++) {
w[i] = length(s[i])
FIELDWIDTHS = FIELDWIDTHS (i>1?" ":"") w[i]
}
next
}
{
sub(/\s*$/," ")
for (i=1;i<=NF;i++) {
if ($i ~ /\S/) {
val[$1][i] = $i
}
}
}
END {
PROCINFO["sorted_in"] = "@ind_num_asc"
for (id in val) {
for (i=1;i<=NF;i++) {
printf "%*s", w[i], val[id][i]
}
print ""
}
}
$ awk -f tst.awk file
Id Name Course1 Course2 Course3 Course4 Course5
1 John 55 67 81 74 34
2 George 29 64 89 91 63
3 Emma 63 80 75 78 69
4 Alex 57 69 64 60 49
5 TERRY 67
6 HELEN 39
答案 4 :(得分:0)
这是我对此的看法。这适用于普通的awk(不使用FIELDWIDTHS),它会自动调整到不同数量的字段(即添加Course7
列,你应该没问题)。此外,您可以将其指向多个文件,并且应该单独处理每个文件。
#!/usr/bin/awk -f
# Initialize variables on the first record of each input file
# (and also print the header)
#
FNR <= 1 {
print
delete name
delete score
next
}
# Process each line.
#
{
id = substr($0, 0, 16) #
name[id] # Store the unique identifier in an array
pos = 0 #
# Step through the score fields until we hit the end of the line,
# storing scores in another array.
do {
score[id, pos] += substr($0,17+pos*8,8) +0
printf("id='%s' pos=%s value=%s total=%s\n", id, pos, substr($0,17+pos*8,8)+0, score[id, pos] );
} while (17+(++pos)*8 < length())
}
# Keep track of our maximum number of fields
pos>max { max=pos }
# Finally, generate our (randomly sorted) output.
END {
for (id in name) { # Step through the records...
printf("%-12s", id);
for (i=0; i<max; i++) { # Step through the fields...
if (score[id, i]==0) score[id, i]=""
printf("%-8s", score[id, i]);
}
printf("\n")
}
}
它有点长,但我认为它更容易理解它的作用。