Question

说我有这个file1.csv由A列和A_score列组成如下：

fefa68e,312
wernjnn,432
ew443fs,300

和file2.csv由一对列B和A列组成，如下所示：

dfaefew,fefa68e
dfaefew,wernjnn
vzcxvvz,ew443fs
ewrwefd,wernjnn
ewrwefd,ew443fs

如何获取file3.csv以获得所有列的最大分数A对B列如下所示：

dfaefew,432
vzcxvvz,300
ewrwefd,432

和file4.csv获取列B的所有列A对的平均分数如下所示：

dfaefew,372
vzcxvvz,300
ewrwefd,366

awk或其他任何可以做的工作吗？我正在使用ubuntu。

提前！

更新

如果file2.csv如下所示：

dfaefew,fefa68e,1
dfaefew,wernjnn,1
vzcxvvz,ew443fs,1
ewrwefd,wernjnn,0
ewrwefd,ew443fs,0

第三列可以是1或0和1，对于相同的第1列值（dfaefew，vzcxvvz等）它是相同的，我想保留第三列并获得如下输出：

dfaefew,432,1
vzcxvvz,300,1
ewrwefd,432,0

dfaefew,372,1
vzcxvvz,300,1
ewrwefd,366,0

Answer 1

以下是awk中的一种方法：

`script.awk`的内容：

# Set the input and output field separators to ","
BEGIN { FS = OFS = "," }   

# Processing first file
# Load the first file in hash keyed at column1 having value of column2
NR==FNR { value[$1] = $2; next }   

# Processing second file
# Keep a counter of column1 and add values for column2
{ count[$1]++; values[$1]+=value[$2] }   

# Find the max for each entry of column1
{ entry[$1] = (($1 in entry) && entry[$1]>value[$2] ? entry[$1] : value[$2]) } 

# In the END block traverse through array and print desired output.
END {
     for (max in entry) print (max, entry[max]) > "file3.csv";
     for (key in entry) print (key, values[key]/count[key]) > "file4.csv";
}

像以下一样运行：

awk -f script.awk file1.csv file2.csv

输出：

$ cat file3.csv
vzcxvvz,300
ewrwefd,432
dfaefew,432

$ cat file4.csv
vzcxvvz,300
ewrwefd,366
dfaefew,372

如何使用awk获得B列所有A列对的最高和平均分数？

1 个答案:

`script.awk`的内容：

像以下一样运行：

输出：

如何使用awk获得B列所有A列对的最高和平均分数？

1 个答案:

script.awk的内容：

像以下一样运行：

输出：

`script.awk`的内容：