我需要打开多个文件并比较它们的内容。我现在的做法很脏。我想知道一种优雅的方式。我需要打开多个文件并查看它们之间的共同元素。
我的代码如下:
sample_1=[]
sample_3=[]
sample_2=[]
sample_4=[]
for line in open("sample_EC1.Regions","r"):
line=line.strip()
sample_1.append(line)
for line in open("sample_EC2.Regions","r"):
line=line.strip()
sample_2.append(line)
for line in open("sample_EC3.Regions","r"):
line=line.strip()
sample_3.append(line)
for line in open("sample_EC4.Regions","r"):
line=line.strip()
sample_4.append(line)
CommonRegions = list(set(sample_2)&set(sample_3)&set(sample_4)&set(sample_1))
print CommonRegions
这段代码很脏,好像文件数量增加,每次我需要更改代码,如果文件数超过50,每次编辑代码都很困难。
答案 0 :(得分:2)
total=4
with open("sample_EC1.Regions","r") as f:
commonregions = {line.strip() for line in f}
for i in range(2,total+1):
with open("sample_EC"+i+".Regions","r") as f:
#set comprehension
sample = {line.strip() for line in f}
commonregions = commonregions & sample
print commonregions
不是为每个文件写入循环,而是尝试将它们放在循环中并执行交集。
改进:
with open(...) as f: s = {l.strip() for l in f}
。带有“for l in f
”的东西称为集合理解,“集合理解”是一个动态生成集合的表达式答案 1 :(得分:1)
最优雅的方法是使用list comprehensions并设置理解:
def file_to_set(path):
with open(path, "r") as f:
return {line.strip() for line in f}
PATHS = ["sample_EC{0}.Regions".format(x) for x in range(1, 5)]
CommonRegions = set.intersection(*[file_to_set(path) for path in PATHS])
print CommonRegions
感谢@ user3789032提醒我set.intersection
。
有了这个,您可以将PATHS
设置为您需要处理的任何文件集。如果要将路径作为命令行参数读取,请使用:
import sys
PATHS = sys.argv[1:]
如果要从标准输入中读取路径:
import sys
PATHS = [line.strip() for line in sys.stdin.readlines()]
如果要从命令行中指定的文件中读取路径:
import sys
with open(sys.argv[1]) as f:
PATHS = [line.strip() for line in f]
答案 2 :(得分:0)
t是文件名列表
b =[]
for fn in t:
a=[]
for line in open(fn,"r"):
line=line.strip()
a.append(line)
b.append(set(a))
CommonRegions = set.intersection(*b)
print(CommonRegions)