我需要使用大文件,必须找到两者之间的差异。我不需要不同的位,但需要差异的数量。
查找我想出的不同行数
diff --suppress-common-lines --speed-large-files -y File1 File2 | wc -l
它有效,但还有更好的方法吗?
如何计算差异的确切数量(使用标准工具,如bash,diff,awk,sed一些旧版本的perl)?
答案 0 :(得分:44)
如果要计算不同的行数,请使用:
diff -U 0 file1 file2 | grep ^@ | wc -l
约翰的回答不是重复计算不同的行吗?
答案 1 :(得分:41)
diff -U 0 file1 file2 | grep -v ^@ | wc -l
diff
列表顶部的两个文件名减去2。统一格式可能比并排格式快一点。
答案 2 :(得分:6)
如果使用Linux / Unix,那么comm -1 file1 file2
如何在file1中打印不在file2中的行,comm -1 file1 file2 | wc -l
来计算它们,以及类似于comm -2 ...
呢?
答案 3 :(得分:5)
由于每个不同的输出行都以<
或>
字符开头,我建议如下:
diff file1 file2 | grep ^[\>\<] | wc -l
只在脚本行中使用\<
或\>
,您只能在其中一个文件中计算差异。
答案 4 :(得分:1)
我相信此answer中的正确解决方案是:
$ diff -y --suppress-common-lines a b | grep '^' | wc -l
1
答案 5 :(得分:0)
答案 6 :(得分:0)
这是一种计算两个文件之间任何类型的差异的方法,并为这些差异指定了正则表达式-这里Lambda
用于表示除换行符以外的任何字符:
.
摘录自git diff --patience --word-diff=porcelain --word-diff-regex=. file1 file2 | pcre2grep -M "^@[\s\S]*" | pcre2grep -M --file-offsets "(^-.*\n)(^\+.*\n)?|(^\+.*\n)" | wc -l
:
man git-diff
--patience
Generate a diff using the "patience diff" algorithm.
--word-diff[=<mode>]
Show a word diff, using the <mode> to delimit changed words. By default, words are delimited by whitespace; see --word-diff-regex below.
porcelain
Use a special line-based format intended for script consumption. Added/removed/unchanged runs are printed in the usual unified diff
format, starting with a +/-/` ` character at the beginning of the line and extending to the end of the line. Newlines in the input
are represented by a tilde ~ on a line of its own.
--word-diff-regex=<regex>
Use <regex> to decide what a word is, instead of considering runs of non-whitespace to be a word. Also implies --word-diff unless it
was already enabled.
Every non-overlapping match of the <regex> is considered a word. Anything between these matches is considered whitespace and ignored(!)
for the purposes of finding differences. You may want to append |[^[:space:]] to your regular expression to make sure that it matches
all non-whitespace characters. A match that contains a newline is silently truncated(!) at the newline.
For example, --word-diff-regex=. will treat each character as a word and, correspondingly, show differences character by character.
是Ubuntu 20.04上pcre2grep
软件包的一部分。