使用Python中的公共列加入文件

时间:2014-04-18 16:01:38

标签: python sorting join merge

我在连接两个带有5个公共列的大文件并返回结果时遇到问题,这些结果是相同的5个元组... 这就是我的意思:

File1中:

132.227 49202 107.21 80
132.227 49202 107.21 80
132.227 49200 107.220 80
132.227 49200 107.220 80
132.227 49222 207.171 80
132.227 49339 184.730 80
132.227 49291 930.184 80
............
............
............

该文件包含许多行而不仅仅是那些......

文件2:

46.109498000 132.227 49200 107.220 80 17 48 
46.927339000 132.227 49291 930.184 80 17 48 
47.422919000 253.123 1985 224.300 1985 17 48
48.412761000 132.253 1985 224.078 1985 17 48
48.638454000 132.127 1985 232.123 1985 17 48
48.909658000 132.227 49291 930.184 80 17 65
48.911360000 132.227 49200 107.220 80 17 231
............
............
............

输出文件:

46.109498000 132.227 49200 107.220 80 17 48 
46.927339000 132.227 49291 930.184 80 17 48 
48.909658000 132.227 49291 930.184 80 17 65
48.911360000 132.227 49200 107.220 80 17 231
............
............
............

这是我写的代码:

with open('log1', 'r') as fl1:
    f1 = [i.split(' ') for i in fl1.read().split('\n')]

with open('log2', 'r') as fl2:
    f2 = [i.split(' ') for i in fl2.read().split('\n')]

def merging(x,y):
    list=[]
    for i in x:
        for j in range(len(i)-1):
            while i[j]==[a[b] for a in y]:
                list.append(i)
                j=j+1
    return list

f3=merging(f1,f2)

for i in f3:
    print i

2 个答案:

答案 0 :(得分:0)

我认为它的 file2 是通过 file1 过滤的。正确?

我认为 file1 未订购。 (如果订购了,还有另一种有效的解决方案)

with open('file1') as file1, open('file2') as file2:
    my_filter = [line.strip().split() for line in file1]
    f3 = [line.strip() for line in filter(lambda x: x.strip().split()[1:5] in my_filter, file2)]

# to see f3
for line in f3:
    print line

首先,构建包含

的过滤器my_filter = [line.strip().split() for line in file1]
[['132.227', '49202', '107.21', '80'], ['132.227', '49202', '107.21', '80'], ['132.227', '49200', '107.220', '80'], ['132.227', '49200', '107.220', '80'], ['132.227', '49222', '207.171', '80'], ['132.227', '49339', '184.730', '80'], ['132.227', '49291', '930.184', '80']]

然后使用filter过滤数据。此代码适用于 Python 2.7 +

答案 1 :(得分:0)

我写了这些内容,看起来很有效:

with open('file1', 'r') as fl1:
    f1 = [i.split(' ') for i in fl1.read().split('\n')]

with open('file2', 'r') as fl2:
    f2 = [i.split(' ') for i in fl2.read().split('\n')]

for i in f2:
    for j in f1:
        if i[1]==j[0] and i[2]==j[1] and i[3]==j[2] and i[4]==j[3]:
            print i

我试图替换

if i[1]==j[0] and i[2]==j[1] and i[3]==j[2] and i[4]==j[3]:

使用:

for k in range(4):
    if i[k+1]==j[k]:
        print i

但它给了我这个错误:

  

回溯(最近一次呼叫最后一次):文件" MERGE.py",第10行,在          如果i [k + 1] == j [k]:IndexError:列表索引超出范围