Question

我之前发布过，但从那时起进行了一些更改，格式发生了很大变化，因此再次提出问题以适应新格式。

我有2个csv电子表格，每个电子表格有2列。它们采用以下格式：

File1.csv

string,Value

string1,4

string2,5

string3,6

string4,7

string5,8

string6,9

然后第二个文件是;

File2.csv

string,Value

string4,8

string5,7

string1,3

string2,7

string3,4  

string7,5

请注意，两个文件中的订单都是随机的。 file1中还有条目不存在于file2中，请注意sting6仅在file1中。

我想要做的是使用Bash（或python，如果这样更容易）查看文件1，在文件2中找到相同的字符串，然后将其输出到第三个文件并将值并排例如：

output.csv
string1,4,3
string2,5,7
string3,6,4
string4,7,8
string5,9,7
string6,9,  
string7,,5

再次，通知字符串6存在但没有比较。

我尝试过以下但没有运气;

#!/bin/bash  
awk  'BEGIN   {FS=OFS=","}  
        NR==FNR {a[$1]=$2; next}   
        FNR==1  {print $1,$2"1",a[$1]"2"; next}   
                {print $1,$2,a[$1]}' File1.csv File2.csv

当我运行这个时，我得到一些奇怪的输出：

4tring,4  
,3string2,5  
,7string3,6  
,4string4

如果这不是最佳方法，请尽快尝试其他方法。

Answer 1

在awk中

$ awk '
BEGIN { FS=OFS="," }                # set delimiters
NR==FNR { a[$1]=$2; next }          # hash first file
{ print $1,a[$1],$2; delete a[$1] } # output second file matches and non-matches
END { for(i in a) print i,a[i] }    # output leftovers from file 1
' file1 file2                       # mind the order
string,Value,Value
string4,7,8
string5,8,7
string1,4,3
string2,5,7
string3,6,4  
string7,,5
string6,9

比较2个CSV文件，使用组合数据输出到第3个文件

1 个答案: