输出3个文本文件中的匹配行以及匹配行下的行

时间:2016-05-10 20:38:50

标签: python linux ubuntu

我对常见数据有疑问。我有三个文本文件,其中包含以下格式的数据:

    cli= 111
    mon= 45

    cli= 584
    mon= 21

    cli= 23
    mon= 417

现在,当我执行它时,我有以下程序,它为我提供了所有匹配的CLI。换句话说,它为我提供了3个文本文件中出现的CLI。

    with open ('/home/user/Desktop/text1.txt', 'r') as file1:
    with open ('/home/user/Desktop/text2.txt', 'r') as file2:
            with open ('/home/user/Desktop/text3.txt', 'r') as file3:
                    same = set(file1).intersection(file2).intersection(file3)
same.discard('\n')

with open ('/home/user/Desktop/common.txt', 'w') as file_out:
    for line in same:
            file_out.write(line)

我的问题是,我还可以输出值(MON = 45)和CLI = 111吗?假设所有3个文本文件中都存在CLI = 111。我想要一个像这样的结果:

    cli= 111
    mon= 45
    mon= 98
    mon= 32

提前致谢。 PS:上面的示例数据只有1个文本文件。假设有3个文本文件。谢谢!

3 个答案:

答案 0 :(得分:0)

您可以在dict中对数据进行分组,然后在所有文件中的cli之后拉线:

with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open('text3.txt', 'r') as file3:
    inter = set(file1).intersection(file2).intersection(file3)

    # create a dict using lists as values to group the mons and remove empty lines
    d = {k: [] for k  in inter if k.strip()}
    # don't need set anymore, dict lookups are also O(1)
    del inter
    # reset pointers
    file1.seek(0), file2.seek(0), file3.seek(0)

    # iterate over files again
    for f in [file1, file2, file3]:
        for line in f:
            if line in d:
                # pull next line if we get a match.
                d[line].append(next(f))

然后写下dict内容:

with open('/home/user/Desktop/common.txt', 'w') as file_out:
    for k,v in d.items():
        file_out.write(k)
        for line in v:
            file_out.write(line)

如果你正在寻找一个特定的行,即从 cli = 开始,那么另一种方法是首先使用file1数据构建dict然后迭代余数,当你去写只写时值/列表长度为== 3的数据:

with open('text1.txt', 'r') as file1, open('text2.txt', 'r') as file2, open(
        'text3.txt', 'r') as file3:
    # create dict from inital file storing following line after cli-.. inside list as value
    d = {k: [next(file1)] for k in file1 if k.starstwith("cli=")}

    for f in [file2, file3]:
        for line in f:
            if line in d:
                d[line].append(next(f))

with open('/home/user/Desktop/common.txt', 'w') as file_out:
    for k, v in d.items():
        # if len is three we have one from each
        if len(v) == 3:
            file_out.write(k)
            for line in v:
                file_out.write(line)

如果你有一个或多个重复cli = ...

的文件,那么失败的唯一方法就是这样

答案 1 :(得分:0)

您似乎丢弃了以后要访问的数据。无需再次解析文件,您需要以某种方式捕获该数据,以便您不会再次查看文件。一种方法(假设每个'cli'每个文件只有一个相应的'mon')将使用字典。

我已经创建了一个函数,它从提供的文件中构建一个Dictionary,其中键是'cli'数据,值是mon数据。从那里,你可以从Dictionary键中构建一个Set()并找到那个交叉点。从交集中,您知道返回的值必须是Dictionary中的键,因此只需将它们连接到'out'字符串并将其写入您的out文件:)

    def buildDict(f):
        dic = {}
        for i in range(0,len(f)):
            if "cli" in f[i]:
                dic[f[i]] = f[i+1]
        return dic

    with open ('1.txt', 'r') as file1:
        f1_dic = buildDict(file1.readlines())
        with open ('2.txt', 'r') as file2:
            f2_dic = buildDict(file2.readlines())
            with open ('3.txt', 'r') as file3:
                f3_dic = buildDict(file3.readlines())
                same = set(f1_dic.keys()).intersection(f2_dic.keys()).intersection(f3_dic.keys())

    out = ''
    for i in same:
        out += i
        out += f1_dic[i]
        out += f2_dic[i]
        out += f3_dic[i]


    with open ('common.txt', 'w') as file_out:
        file_out.write(out)

答案 2 :(得分:0)

有趣的黑客你已经在那里建立了一系列的线路;但正如你所看到的那样,它有点过于聪明,因为$name行与mon行分开了。因此,让我们更仔细地阅读,以便不会发生这种情况:

cli

我将每个文件转换为将import re def getfile(fname): with open(fname) as file1: text = file1.read() records = text.split("\n\n") return dict(re.search(r"cli= *(\d+)\nmon= *(\d+)", rec).groups() for rec in records) d1 = getfile('/home/user/Desktop/text1.txt') d2 = getfile('/home/user/Desktop/text2.txt') d3 = getfile('/home/user/Desktop/text3.txt') same = set(d1).intersection(d2).intersection(d3) print("cli="+same) print("mon="+d1[same]) print("mon="+d2[same]) print("mon="+d3[same]) 值映射到cli值的字典,因为它们成对出现。然后我们将mon值相交并使用它们来查找cli值。