使用join将基于相同列的2个csv文件组合在一起

时间:2018-05-07 13:08:49

标签: bash csv join merge

我一直在尝试将两个csv文件与基于它们共享的列名称的字母数字数据组合在一起,以便我可以使用终端执行类似于它们的连接。

这是我尝试的内容:(我的两个文件的第一列完全相同)

  

加入-t,-1 1 -2 1 file_1.csv file_2.csv> file_3.csv

合并发生正常,我的列合并但不是我想要的格式。

问题: file_3由两个文件中的行组成,但用逗号分隔,但是用不同的行。

示例:

     Columns from file_1
     ,Columns from file_2
     Row1 from file_1
     ,Row1 from file_2
     Row2 from file_1
     ,Row2 from file_2

如何在每行合并中将file_3数据放在一行中?任何指示即将继续。

file_1.csv :(示例数据)

Id,Age,Employment,Education,Marital,Occupation,Income,Gender,Deductions,Hours,Adjusted
1,38,Private,College,Unmarried,Service,81838,Female,0,72,0  
2,35,Private,Associate,Absent,Transport,72099,Male,0,30,0  
3,32,Private,HSgrad,Divorced,Clerical,154676.74,Male,0,40,0

file_2.csv :(示例数据)

Id,Adjusted,Predicted_Adjusted,Probability_0,Probability_1
1,0,0,0.952957896225136,0.0470421037748636 . 
2,0,0,0.973664421132328,0.0263355788676716 . 
3,0,0,0.966224074718457,0.0337759252815426

错误加入:

Id,Age,Employment,Education,Marital,Occupation,Income,Gender,Deductions,Hours,Adjusted
,Adjusted,Predicted_Adjusted,Probability_0,Probability_1
1,38,Private,College,Unmarried,Service,81838,Female,0,72,0
,0,0,0.952957896225136,0.0470421037748636
2,35,Private,Associate,Absent,Transport,72099,Male,0,30,0
,0,0,0.973664421132328,0.0263355788676716
3,32,Private,HSgrad,Divorced,Clerical,154676.74,Male,0,40,0
,0,0,0.966224074718457,0.0337759252815426

预期产量: 每两行实际上是一行,因此预期的输出不应该将行分成两行,而应该表示两个csv文件的同源合并,即file_1和file_2

2 个答案:

答案 0 :(得分:2)

带有Windows换行符\r的文件是什么? 您可以尝试dos2unix file_1.csvdos2unix file_2.csv

答案 1 :(得分:1)

这应该有效:

join -t , -1 1 -2 1 file_1.csv file_2.csv|paste -d' ' - - > file_3.csv