我对常见数据有疑问。我有三个文本文件,其中包含以下格式的数据:
cli= 111
mon= 45
cli= 584
mon= 21
cli= 23
mon= 417
现在,当我执行它时,我有以下程序,它为我提供了所有匹配的CLI。换句话说,它为我提供了3个文本文件中出现的CLI。
with open ('/home/user/Desktop/text1.txt', 'r') as file1:
with open ('/home/user/Desktop/text2.txt', 'r') as file2:
with open ('/home/user/Desktop/text3.txt', 'r') as file3:
same = set(file1).intersection(file2).intersection(file3)
same.discard('\n')
with open ('/home/user/Desktop/common.txt', 'w') as file_out:
for line in same:
file_out.write(line)
我的问题是,我还可以输出值(MON = 45)和CLI = 111吗?假设所有3个文本文件中都存在CLI = 111。我想要一个像这样的结果:
cli= 111
mon= 45
mon= 98
mon= 32
提前致谢。 PS:上面的示例数据只有1个文本文件。假设有3个文本文件。谢谢!
答案 0 :(得分:0)
您可以在dict中对数据进行分组,然后在所有文件中的cli之后拉线:
with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open('text3.txt', 'r') as file3:
inter = set(file1).intersection(file2).intersection(file3)
# create a dict using lists as values to group the mons and remove empty lines
d = {k: [] for k in inter if k.strip()}
# don't need set anymore, dict lookups are also O(1)
del inter
# reset pointers
file1.seek(0), file2.seek(0), file3.seek(0)
# iterate over files again
for f in [file1, file2, file3]:
for line in f:
if line in d:
# pull next line if we get a match.
d[line].append(next(f))
然后写下dict内容:
with open('/home/user/Desktop/common.txt', 'w') as file_out:
for k,v in d.items():
file_out.write(k)
for line in v:
file_out.write(line)
如果你正在寻找一个特定的行,即从 cli = 开始,那么另一种方法是首先使用file1数据构建dict然后迭代余数,当你去写只写时值/列表长度为== 3的数据:
with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open(
'text3.txt', 'r') as file3:
# create dict from inital file storing following line after cli-.. inside list as value
d = {k: [next(file1)] for k in file1 if k.starstwith("cli=")}
for f in [file2, file3]:
for line in f:
if line in d:
d[line].append(next(f))
with open('/home/user/Desktop/common.txt', 'w') as file_out:
for k, v in d.items():
# if len is three we have one from each
if len(v) == 3:
file_out.write(k)
for line in v:
file_out.write(line)
如果你有一个或多个重复cli = ...
的文件,那么失败的唯一方法就是这样答案 1 :(得分:0)
您似乎丢弃了以后要访问的数据。无需再次解析文件,您需要以某种方式捕获该数据,以便您不会再次查看文件。一种方法(假设每个'cli'每个文件只有一个相应的'mon')将使用字典。
我已经创建了一个函数,它从提供的文件中构建一个Dictionary,其中键是'cli'数据,值是mon数据。从那里,你可以从Dictionary键中构建一个Set()并找到那个交叉点。从交集中,您知道返回的值必须是Dictionary中的键,因此只需将它们连接到'out'字符串并将其写入您的out文件:)
def buildDict(f):
dic = {}
for i in range(0,len(f)):
if "cli" in f[i]:
dic[f[i]] = f[i+1]
return dic
with open ('1.txt', 'r') as file1:
f1_dic = buildDict(file1.readlines())
with open ('2.txt', 'r') as file2:
f2_dic = buildDict(file2.readlines())
with open ('3.txt', 'r') as file3:
f3_dic = buildDict(file3.readlines())
same = set(f1_dic.keys()).intersection(f2_dic.keys()).intersection(f3_dic.keys())
out = ''
for i in same:
out += i
out += f1_dic[i]
out += f2_dic[i]
out += f3_dic[i]
with open ('common.txt', 'w') as file_out:
file_out.write(out)
答案 2 :(得分:0)
有趣的黑客你已经在那里建立了一系列的线路;但正如你所看到的那样,它有点过于聪明,因为$name
行与mon
行分开了。因此,让我们更仔细地阅读,以便不会发生这种情况:
cli
我将每个文件转换为将import re
def getfile(fname):
with open(fname) as file1:
text = file1.read()
records = text.split("\n\n")
return dict(re.search(r"cli= *(\d+)\nmon= *(\d+)", rec).groups() for rec in records)
d1 = getfile('/home/user/Desktop/text1.txt')
d2 = getfile('/home/user/Desktop/text2.txt')
d3 = getfile('/home/user/Desktop/text3.txt')
same = set(d1).intersection(d2).intersection(d3)
print("cli="+same)
print("mon="+d1[same])
print("mon="+d2[same])
print("mon="+d3[same])
值映射到cli
值的字典,因为它们成对出现。然后我们将mon
值相交并使用它们来查找cli
值。