免责声明:我是编程和脚本编程的新手,所以请原谅缺乏技术术语
所以我有两个包含列出名称的文本文件数据集:
First File | Second File
bob | bob
mark | mark
larry | bruce
tom | tom
我想运行一个脚本(pref python),它输出一个文本文件中的交叉线和另一个文本文件中的不同行,例如:
matches.txt :
bob
mark
tom
differences.txt :
bruce
我如何用Python实现这一目标?或者使用Unix命令行,如果它很容易吗?
答案 0 :(得分:16)
排序| uniq很好,但是comm可能会更好。 “man comm”了解更多信息。
从手册页:
EXAMPLES
comm -12 file1 file2
Print only lines present in both file1 and file2.
comm -3 file1 file2
Print lines in file1 not in file2, and vice versa.
您也可以使用Python集类型,但comm更容易。
答案 1 :(得分:9)
Unix shell解决方案 - :
# duplicate lines
sort text1.txt text2.txt | uniq -d
# unique lines
sort text1.txt text2.txt | uniq -u
答案 2 :(得分:4)
words1 = set(open("some1.txt").read().split())
words2 = set(open("some2.txt").read().split())
duplicates = words1.intersection(words2)
uniques = words1.difference(words2).union(words2.difference(words1))
print "Duplicates(%d):%s"%(len(duplicates),duplicates)
print "\nUniques(%d):%s"%(len(uniques),uniques)
至少这样的东西
答案 3 :(得分:1)
Python字典是O(1)或非常接近,换句话说它们非常快(但如果您要编制索引的文件很大,它们会占用大量内存)。所以首先在第一个文件中读取并构建一个字典,如:
left = [x.strip() for x in open('left.txt').readlines()]
列表理解和strip()是必需的,因为readlines将带有尾随换行的行保持原样。这将创建文件中所有项目的列表,假设每行一个(如果它们都在一行上,则使用.split)。
现在建立一个字典:
ldi = dict.fromkeys(left)
这将构建一个字典,列表中的项目为键。这也涉及重复。现在遍历第二个文件并检查密钥是否在dict中:
matches = open('matches.txt', 'w')
uniq = open('uniq.txt', 'w')
for l in open('right.txt').readlines():
if l.strip() in ldi:
# write to matches
matches.write(l)
else:
# write to uniq
uniq.write(l)
matches.close()
uniq.close()
答案 4 :(得分:0)
>>> with open('first.txt') as f1, open('second.txt') as f2:
w1 = set(f1)
w2 = set(f2)
>>> with open('matches.txt','w') as fout1, open('differences.txt','w') as fout2:
fout1.writelines(w1 & w2)
fout2.writelines(w2 - w1)
>>> with open('matches.txt') as f:
print f.read()
bob
mark
tom
>>> with open('differences.txt') as f:
print f.read()
bruce
答案 5 :(得分:0)
用水平线制作一个;
file_1_list = []
with open(input('Enter the first file name: ')) as file:
file_1 = file.read()
file.seek(0)
lines = file.readlines()
for line in lines:
line = line.strip()
file_1_list.append(line)
with open(input('Enter the second file name: ')) as file:
file_2 = file.read()
file.seek(0)
lines = file.readlines()
for line in lines:
line = line.strip()
if file_1 == file_2:
print("Yes")
else:
print("No")
print(file_1)
print("--------------")
print(file_2)