Question

我希望比较两个相似的txt文件，但优先考虑其中一个可能在最后包含更多数据的文件。

例如：

file1.txt

  userID:userBalance:userType:userCountry
  userID1:userBalance1:userType1:userCountry
  userID2:userBalance2:userType2:userCountry
  userID3:userBalanc3:userType3:userCountry

file2.txt

  userID:userBalance
  userID1:userBalance1
  userID2:userBalance2

output.txt

  userID:userBalance:userType:userCountry
  userID1:userBalance1:userType1:userCountry
  userID2:userBalance1:userType2:userCountry

我希望输出打印来自file1的行，其中包含来自file2的相似文本。

我已经尝试了几种方法，这些方法仅在每行相同的情况下才有效，即使前两个部分与上面的示例相同，也无法添加其他字符串。

根据我发现的情况，我需要某种方式仅比较每行中用“：”分隔的初始字符串，并从file1中输出该行（如果在file2中找到该行）。

Answer 1

使用python，您只需使用 in 关键字即可检查一个字符串是否包含在另一个字符串中：

str2 in str1

因此您可以执行以下操作：

lines1 = list()
lines2 = list()

with open('file1.txt', 'r') as f1:
    for l1 in f1:
        lines1.append(l1.strip())

with open('file2.txt', 'r') as f2:
    for l2 in f2:
        lines2.append(l2.strip())

with open('output.txt', 'w') as out:
    for elt in [l1 for l2 in lines2 for l1 in lines1 if l2 in l1]:
        out.write('{}\n'.format(elt))

最重要的部分是：

[l1 for l2 in lines2 for l1 in lines1 if l2 in l1]

这意味着您要创建 l1 （来自file1.txt的文本行）（如果有） l2 （来自文件2的文本行）的新列表。 txt）包含在此 l1

中

Answer 2

我的第一个想法是使用.split(":")并将每个分割的字符串放入每行的数组中，然后对于您给出的示例，仅比较列表的前两个索引。对于文件的每一行，伪代码可能看起来像这样：

stringArray = fullStringFromFile1.split(":")
stringArray2 = fullStringFromFile2.split(":")
for i in range(len(smallestStringArray)):
    if stringArray[i] == stringArray2[i]:
         duplicateStringList.append(stringArray[i])

希望这可以帮助您指出正确的方向

如何比较相似的文本文件并输出重复项？

2 个答案: