Question

我希望你很好。

我有两个txt文件：data.txt和to_remove.txt

data.txt有很多行，每行有几个整数，中间有空格。 data.txt中的一行如下所示：1001 1229 19910

to_remove.txt有很多行，每行有一个整数。 to_remove.txt中的一行如下所示：1229

我想写一个新的txt文件，其中包含data.txt而没有to_remove.txt中的整数我知道每行data.txt的第一个元素没有to_remove.txt的任何元素;所以我需要用to_remove.txt

中的每个整数检查每一行的所有非第一个元素

我写了代码来做这件事，但我的代码太慢了。 data.txt有超过一百万行，而to_remove.txt有几十万行

如果您可以建议更快的方法，那将非常有用。

这是我的代码：

with open('new.txt', 'w') as new:
    with open('data.txt') as data:
        for line in data:
            connections = []
            currentline = line.split(" ")
            for i in xrange(len(currentline)-2):
                n = int(currentline[i+1])
                connections.append(n)
            with open('to_remove.txt') as to_remove:
                for ID in to_remove:
                    ID = int(ID)
                    if ID in connections:
                        connections.remove(ID)
            d = '%d '
            connections.insert(0,int(currentline[0]))
            for j in xrange(len(connections)-1):
                d = d + '%d '
            new.write((d % tuple(connections) + '\n'))

Answer 1

你的代码有点混乱，所以我重写了而不是编辑。提高速度的主要方法是在set()中存储要删除的数字，这样可以进行有效的O（l）成员资格测试：

with open('data.txt') as data, open('to_remove.txt') as to_remove, open('new.txt', 'w') as new:
    nums_to_remove = {item.strip() for item in to_remove} # create a set of strings to check for removing
    for line in data:
        numbers = line.rstrip().split() # create numbers list (note: these are stored as strings)
        if not any(num in nums_to_remove for num in numbers[1:]): # check for the presence of numbers to remove
            new.write(line) # write to the new file

Answer 2

我开发了一个代码来回答我的问题，使用了一些答案中的代码，以及对问题的评论中的建议。

def return_nums_remove(): 
    with open('to_remove.txt') as to_remove:
        nums_to_remove = {item.strip() for item in to_remove}
    return nums_to_remove 
with open('data.txt') as data, open('new.txt', 'w') as new:
    nums_to_remove = return_nums_remove()
    for line in data:
        numbers = line.rstrip().split()
        for n in numbers:
            if n in nums_to_remove:
                numbers.remove(n)
        if len(numbers) > 1:
            s = '%s '
            for j in xrange(len(numbers)-1):
                s = s + '%s '
            new.write((s % tuple(numbers) + '\n'))

Python：从另一个txt文件中删除一个txt文件的元素

2 个答案: