Question

我有两个文件和两个场景：

两个文件具有相同的内容，但内容顺序不同。例如：
- 文件1：tom albert jim
- 文件2：albert jim tom
这两个文件都有相同的重要内容（例如jim，albert和tom）以及其他不重要的内容（例如jack或{{1}应该被排除在外。例如：
- 文件1：jason
- 文件2：tom albert jim jason

简单albert jim tom或true即可。当然，在这两个示例中，输出应为false。有什么想法吗？

Answer 1

您可以尝试这一点，只需按字母顺序排序，然后逐项比较。希望这个帮助

#Let's call f1 and f2 are string that you read from f1 and f2
f1 = 'tom albert jim jason'
f2 = 'jack albert jim tom'

unimportant_list = ['jim', 'albert', 'tom'] #this should be defined somewhere

#make list data of f1, f2. word split by a space and remove item in unimportant_list
list1 = [x for x in f1.split(' ') if x not in unimportant_list]
list2 = [x for x in f2.split(' ') if x not in unimportant_list]

#sort both list for easy compare
list1.sort()
list2.sort()

#compare length of 2 list first for better performance and also prevent exception in the for loop
if not len(list1) == len(list2):
    return false

#compare 2 list one by one
result = true
for i in range (len(list1)):
    if not list1[i] == list2[i]: #if some item not equal mean 2 list is not same
        result = false
return result

Answer 2

这个答案假设您的输入在逻辑上是 bag 的值，i。即价值计算，但他们的位置无关紧要。它还假设其他文件中的数量大于发起程序文件中的数量是可以的，但反之则不然。最后，它假定只允许 initiator 文件中的值出现在其他文件中。

①读取两个文件，②将每个文件的内容（可能用空格？）分成包（我们使用collections.Counter），③检查是否有未满足的要求启动器文件，④检查其他文件中是否有意外值。

①读取两个文件：

with open('initiator') as f:
  contentsI = f.read()
with open('other') as f:
  contentsO = f.read()

②将内容分成几组，删除过程中所有不需要的东西：

from collections import Counter
tokensI = Counter(value for value in contentsI.split()
                        if value not in [ 'unwanted1', 'unwanted2' ])
tokensO = Counter(value for value in contentsO.split()
                        if value not in [ 'unwanted1', 'unwanted2' ])

③＆amp; ④比较袋子：

return not (tokensI - tokensO) and not (set(tokensO) - set(tokensI))

如何比较具有相同未分类（和其他不重要）内容的两个文件？

2 个答案: