Question

所以我有两个文件file1和file2，大小不等，每行至少有一百万return separated行。我希望将file1中的内容与file2匹配，如果匹配，请从file1中删除相同内容。例如：

+------------+-----------+--------------------------+
| file1      | file2     | after processing - file1 |
+------------+-----------+--------------------------+
| google.com | in.com    | google.com               |
+------------+-----------+--------------------------+
| apple.com  | quora.com | apple.com                |
+------------+-----------+--------------------------+
| me.com     | apple.com |                          |
+------------+-----------+--------------------------+

我的代码看起来就像。

with open(file2) as fin:
        exclude = set(line.rstrip() for line in fin)

for line in fileinput.input(file1, inplace=True):
        if line.rstrip() not in exclude:
            print
            line,

只删除file1的所有内容。我该如何解决这个问题？感谢。

Answer 1

您的print语句及其参数位于不同的行。请改为print line,。

Answer 2

如果工作记忆不是问题，我建议一个粗略的解决方案 - 加载file1然后迭代import os import shutil FILE1 = "file1" # path to file1 FILE2 = "file2" # path to file2 # first load up FILE2 in the memory with open(FILE2, "r") as f: # open FILE2 for reading file2_lines = {line.rstrip() for line in f} # use a set for FILE2 for fast matching # open FILE1 for reading and a FILE1.tmp file for writing with open(FILE1, "r") as f_in, open(FILE1 + ".tmp", "w") as f_out: for line in f_in: # loop through the FILE1 lines if line.rstrip() in file2_lines: # match found, write to a temporary file f_out.write(line) # finally, overwrite the FILE1 with temporary FILE1.tmp os.remove(FILE1) shutil.move(FILE1 + ".tmp", FILE1)写下匹配的行：

def read_value(encoded_command):
    s.write(encoded_command)
    temp = ''
    response = ''
    while '\r' not in response:
        response = s.read().decode()
        temp = temp + response
    return temp

编辑：显然，fileinput.input()做的几乎一样，所以你的问题确实是一个错字。哦，好吧，为后代留下答案，因为这可以让你更好地控制整个过程。

从第二个文件中包含的第一个文件中删除行

2 个答案: