Question

我有2个CSV文件：

file_1 columns: id,user_id,message_id,rate
file_2 columns: id,type,timestamp

文件之间的关系是file_1.message_id = files_2.id。

我想创建一个包含以下列的第3个文件：

file_1.id,file_1.user_id,file_1.message_id,file_1.rate,file_2.timestamp

关于如何在Linux中执行此操作的任何想法？

Answer 1

您可以使用join命令，如下所示：

join -t, -1 3 -2 1 -o 1.1 1.2 1.3 1.4 2.3 <(sort -t, -k 3,3 file1) <(sort file2)

首先对文件进行排序（file1按第3个字段排序），然后使用file1的第3个字段和file2的第1个字段将它们连接起来。然后输出您需要的字段。

Answer 2

似乎是SQLite的工作。使用SQLite shell：

 create table f1(id,user_id,message_id,rate);
 create table f2(id,type,timestamp);

 .separator ,
 .import 'file_1.txt' f1
 .import 'file_2.txt' f2

 CREATE INDEX i1 ON f1(message_id ASC); -- optional
 CREATE INDEX i2 ON f2(id ASC);         -- optional

 .output 'output.txt'
 .separator ,

 SELECT f1.id, f1.user_id, f1.message_id, f1.rate, f2.timestamp
   FROM f1
   JOIN f2 ON f2.id = f1.message_id;

 .output stdout
 .q

请注意，如果单行中的逗号数量存在单个错误，则导入阶段将失败。您可以阻止脚本的其余部分在脚本开头使用.bail on运行。

如果您想要无与伦比的ID，可以尝试：

SELECT f1.* FROM f1 LEFT JOIN f2 on f2.id = f1.message_id WHERE f2.id IS NULL

将从f1中选择f2中未找到相应行的每一行。

Answer 3

使用awk你可以尝试这样的事情 -

awk -F, 'NR==FNR{a[$3]=$0;next} ($1 in a){print a[$1]","$3 > "file_3"}' file_1 file_2

测试：

[jaypal:~/Temp] cat file_1     # Contents of File_1
id,user_id,message_id,rate
1,3334,424,44

[jaypal:~/Temp] cat file_2     # Contents of File_2
id,type,timestamp
424,rr,22222

[jaypal:~/Temp] awk -F, 'NR==FNR{a[$3]=$0;next} ($1 in a){print a[$1]","$3 > "file_3"}' file_1 file_2

[jaypal:~/Temp] cat file_3     # Contents of File_3 made by the script
1,3334,424,44,22222

Answer 4

你可以试试这个：
1.将所有行更改为以键开头：

awk -F',' { print $3 " file1 " $1 " " $2 " " $4 } < file1 >  temp
awk -F',' { print $1 " file2 " $2 " " $3 }        < file2 >> temp

现在这些行看起来像：

message_id file1 id user_id rate
id file2 type timestamp

按前两列排序temp。现在相关的行是相邻的，首先是file1

sort -k 1,1 -k 2,2 < temp > temp2
运行awk以阅读这些行。在file1行中保存字段，在file2行中打印它们。

Linux - 加入2个CSV文件

4 个答案:

测试：