是否有算法或类似diff的实用程序来查找两个csv文件之间的区别? 例如:
file1
-------
key1,value1
key2,value2
key3,value3
key5,value5
key7,value7
file2
-------
key1,value1
key3,value3
key4,value4
key5,value5
key6,value6
使用这种类似diff的实用程序,它将输出3种类型的记录:
答案 0 :(得分:6)
diff可以做你想要的......
diff file1.csv file2.csv --old-line-format="< %L" --new-line-format="> %L" --unchanged-line-format="= %L"
答案 1 :(得分:2)
答案 2 :(得分:2)
查看http://sourceforge.net/projects/csvdiff/
csvdiff是一个用于对两个csv文件进行差异/比较的Perl脚本 选择分隔符的可能性。差异将显示如下: “记录999中的XYZ列”是不同的。在此之后,实际和 将显示此列的预期结果。
答案 3 :(得分:1)
您可以使用unix'join'命令执行此操作。它也可以在Cygwin for Windows中使用。
示例:
$ join -t ',' -v 1 file1 file2
key2,value2
key7,value7
$ join -t ',' -v 2 file1 file2
key4,value4
key6,value6
$ join -t ',' file1 file2
key1,value1,value1
key3,value3,value3
key5,value5,value5
答案 4 :(得分:1)
开源DiffKit能够做到这一点:
www.diffkit.org
答案 5 :(得分:0)
你可以在Perl中使用哈希。将每个文件读入单独的哈希,例如
my %File1 = ();
my %File2 = ();
# Filehandles FP1 and FP2 is opened for read
while (<FP1>) {
if (/^([^,]+),(.+)$/) {
my ($key, $value) = ($1, $2);
$File1{$key} = $value;
}
}
# Repeat for FP2
要打印结果,您可以遍历散列并检查键/值是否以各种方式相同,不同或缺失。例如:
for my $key (keys %File1) {
if (defined($File1{$key}) && defined($File2{$key}) {
print("$key exists in both files\n");
} elsif (defined($File1{$key})) {
print("$key exists only in file1\n");
}
}
# Repeat for %File2
答案 6 :(得分:0)
您可以查看我的FOSS CSV流编辑器CSVfix,它可以通过join命令执行您想要的操作 - 无需编程。
答案 7 :(得分:0)
使用SQLite的示例怎么样?
DROP TABLE 'file1';
DROP TABLE 'file2';
CREATE TABLE 'file1' (
key_field VARCHAR primary key,
value_field VARCHAR
);
CREATE TABLE 'file2' (
key_field VARCHAR primary key,
value field VARCHAR
);
.bail off
.separator ,
.import file1.csv file1
.import file2.csv file2
.output stdout
.header on
SELECT col1 AS 'In file1.csv, not in file2.csv' FROM (
SELECT file1.key_field AS col1,
file2.key_field AS col2
FROM file1 LEFT OUTER JOIN file2
ON file1.key_field == file2.key_field
)
WHERE col2 IS NULL
;
SELECT col2 AS 'In file2.csv, not in file1.csv'FROM (
SELECT file1.key_field AS col1,
file2.key_field AS col2
FROM file2 LEFT OUTER JOIN file1
ON file2.key_field == file1.key_field
) WHERE col1 IS NULL
;
SELECT file1.key_field AS 'In both file1.csv and file2.csv'
FROM file1 INNER JOIN file2
WHERE file1.key_field == file2.key_field
;
这是输出:
C:\Temp> sqlite3 test.db < t.sql
In file1.csv, not in file2.csv
key2
key7
In file2.csv, not in file1.csv
key4
key6
In both file1.csv and file2.csv
key1
key3
key5