如何找到固定宽度文件的两个连续行之间的差异和差异点?
示例文件:
cat test.txt
1111111111111111122211111111111111
1111111111111111132211111111111111
输出:
它应告知用户两行之间存在差异,差异的位置为:第18个字符。(如上例所示)
如果它可以列出多个变化的所有位置,那将非常有用。例如:
11111111111111111211113111
11111111111111111211114111
应该说:第18和第26个字符中出现差异。
我正在尝试以下几行,但似乎迷失了。
while read line
do
echo $line |sed 's/./ &/g' |xargs -n1 #NOt able to apply diff (stupid try)
done <test.txt
答案 0 :(得分:2)
Perl救援:
$ echo '11131111111111111211113111
11111111111111111211114111' \
| perl -le '$d = <> ^ <>;
print pos $d while $d =~ /[^\0]/g'
4
23
它对两个输入字符串进行异或,并报告结果不是空字节的所有位置,即字符串不同的位置。
答案 1 :(得分:1)
您可以使用空字段分隔符将每个字符设为awk
中的字段,并将每个偶数记录的条目与奇数记录进行比较:
awk 'BEGIN{ FS="" } NR%2 {
split($0, a)
next
}
{
print "line # ", NR
for (i=1; i<=NF; i++)
if ($i != a[i])
print "difference spotted in position:", i
}' test.txt
line # 2
difference spotted in position: 18
line # 4
difference spotted in position: 18
difference spotted in position: 23
输入数据为:
cat test.txt
1111111111111111122211111111111111
1111111111111111132211111111111111
11111111111111111211113111
11111111111111111311114111
PS:只有在awk
为空时才将记录拆分为字符的FS
版本,例如GNU awk,OSX awk等。
答案 2 :(得分:1)
$ cat tst.awk
{ curr = $0 }
(NR%2)==0 {
currLgth = length(curr)
prevLgth = length(prev)
maxLgth = (currLgth > prevLgth ? currLgth : prevLgth)
print "Comparing:"
print prev
print curr
for (i=1; i<=maxLgth; i++) {
prevChar = substr(prev,i,1)
currChar = substr(curr,i,1)
if ( prevChar != currChar ) {
printf "Difference: char %d line %d = \"%s\", line %d = \"%s\"\n", i, NR-1, prevChar, NR, currChar
}
}
print ""
}
{ prev = curr }
$ cat file
1111111111111111122211111111111111
1111111111111111132211111111111111
11111111111111111111111111
11111111111111111111111
$ awk -f tst.awk file
Comparing:
1111111111111111122211111111111111
1111111111111111132211111111111111
Difference: char 18 line 1 = "2", line 2 = "3"
Comparing:
11111111111111111111111111
11111111111111111111111
Difference: char 24 line 3 = "1", line 4 = ""
Difference: char 25 line 3 = "1", line 4 = ""
Difference: char 26 line 3 = "1", line 4 = ""