说我有这个file1.csv由A列和A_score列组成如下:
fefa68e,312
wernjnn,432
ew443fs,300
和file2.csv由一对列B和A列组成,如下所示:
dfaefew,fefa68e
dfaefew,wernjnn
vzcxvvz,ew443fs
ewrwefd,wernjnn
ewrwefd,ew443fs
如何获取file3.csv以获得所有列的最大分数A对B列如下所示:
dfaefew,432
vzcxvvz,300
ewrwefd,432
和file4.csv获取列B的所有列A对的平均分数如下所示:
dfaefew,372
vzcxvvz,300
ewrwefd,366
awk或其他任何可以做的工作吗?我正在使用ubuntu。
提前!
更新
如果file2.csv如下所示:
dfaefew,fefa68e,1
dfaefew,wernjnn,1
vzcxvvz,ew443fs,1
ewrwefd,wernjnn,0
ewrwefd,ew443fs,0
第三列可以是1或0和1,对于相同的第1列值(dfaefew,vzcxvvz等)它是相同的,我想保留第三列并获得如下输出:
dfaefew,432,1
vzcxvvz,300,1
ewrwefd,432,0
dfaefew,372,1
vzcxvvz,300,1
ewrwefd,366,0
答案 0 :(得分:3)
以下是awk
中的一种方法:
script.awk
的内容:# Set the input and output field separators to ","
BEGIN { FS = OFS = "," }
# Processing first file
# Load the first file in hash keyed at column1 having value of column2
NR==FNR { value[$1] = $2; next }
# Processing second file
# Keep a counter of column1 and add values for column2
{ count[$1]++; values[$1]+=value[$2] }
# Find the max for each entry of column1
{ entry[$1] = (($1 in entry) && entry[$1]>value[$2] ? entry[$1] : value[$2]) }
# In the END block traverse through array and print desired output.
END {
for (max in entry) print (max, entry[max]) > "file3.csv";
for (key in entry) print (key, values[key]/count[key]) > "file4.csv";
}
awk -f script.awk file1.csv file2.csv
$ cat file3.csv
vzcxvvz,300
ewrwefd,432
dfaefew,432
$ cat file4.csv
vzcxvvz,300
ewrwefd,366
dfaefew,372