我有两个文件test1.txt和test2.txt
test1.txt包含
abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt
和test2.txt包含
12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt
我想比较这两个文件并在bash
中输出类似的内容两者:
abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
仅限于test1.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt
仅限于test2.txt
10111.2222.txt
答案 0 :(得分:3)
两者:
grep -f text2.txt text1.txt
输出:
abc.cde.ccd.eed.12345.5678.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
<小时/> 仅在test1.txt中:
grep -v -f text2.txt text1.txt
输出:
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
aabc.cdve.cncd.ened.19945.2345.txt
<小时/> 仅在test2.txt中:
grep -v -f <( grep -Eo '[0-9]+.[0-9]+.txt' text1.txt) text2.txt
输出:
10111.2222.txt
答案 1 :(得分:0)
File1 :
abc.cde.ccd.eed.12345.5678.txt
abcd.cdde.ccdd.eaed.12346.5688.txt
aabc.cade.cacd.eaed.13345.5078.txt
abzc.cdae.ccda.eaed.29345.1678.txt
abac.cdae.cacd.eead.18145.2678.txt
aabc.cdve.cncd.ened.19945.2345.txt
File2 :
12345.5678.txt
29345.1678.txt
18145.2678.txt
10111.2222.txt
#!/bin/bash
if [ -e Both.txt ]
then
rm Both.txt
fi
if [ -e File1.txt ]
then
rm File1.txt
fi
if [ -e File2.txt ]
then
rm File2.txt
fi
while read f2line
do
found=0
while read f1line
do
Both=`echo "$f1line" | grep "$f2line"`
if [ $? -eq 0 ]
then
found=1
echo $Both >> Both.txt
fi
done < File1
if [ $found -eq 0 ]
then
echo $f2line >> File2.txt
fi
done < File2
sort Both.txt > s_Both.txt
sort File1 > s_File1
comm -3 s_File1 s_Both.txt > File1.txt
rm s_File1
rm s_Both.txt
输出文件:Both.txt,File1.txt,File2.txt
答案 2 :(得分:0)
以下AWK脚本script.awk
也可以完成这项任务:
NR == FNR { lines[++i] = $0 }
NR > FNR { patterns[++j] = $0 }
END {
for (p_index in patterns)
for (l_index in lines)
if (index(lines[l_index], patterns[p_index]) > 0) {
lines_match[l_index] = 1
patterns_match[p_index] = 1
}
print "Lines only in first file:"
for (l_index in lines)
if (!(l_index in lines_match))
print lines[l_index]
print "Lines only in second file:"
for (p_index in patterns)
if (! (p_index in patterns_match))
print patterns[p_index]
print "Lines in both files:"
for (l_index in lines)
if (l_index in lines_match)
print lines[l_index]
}
可以如下调用:
awk -f script.awk test1.txt test2.txt
请注意,脚本不会对两个文件中的数据结构做任何假设。它只是假设test2.txt
中的行是test1.txt
中行的潜在子串。
答案 3 :(得分:0)
可以使用GNU Coreutils中的comm
解决此公式:
首先排序第二个文件:
sort -o test2.txt test2.txt;
然后使用命令显示行:
# unique to test1.txt
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -23 - test2.txt
# unique to test2.txt
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -13 - test2.txt
# that appear in both files
cut -d '.' -f 1-4 --complement test1.txt | sort | comm -12 - test2.txt
<强>解释强>:
# 1. Extract all but first four fields from test1.txt
cut -d '.' -f 1-4 --complement test1.txt
# 2. Here '-' replaces standard input
comm -3 - test2.txt