Question

我有一个文本文件，如下所示：

0010000110
1111010111
0000110111

我想在python中将它们作为列表导入，然后将列表中的每个元素与其他列表中的对应元素进行比较，并对列表的所有组合进行比较。如果两个元素都是1，则将计数器增加1，最后除以列表的长度。我试图编写代码，但它不能正常工作：

with open("D:/test/Vector.txt", "r") as f1:
   for a in f1:
      with open("D:/test/Vector.txt", "r") as f2:
         for b in f2:
            for i in range(10):
                result = 0;
                counter = 0;
                if int(a[i]) == int(b[i]) == 1:
                    counter = counter+1
            result = counter / 10;
            print(a, b, result)

编辑：使用python创建文本文件时，它会在新行上移动每个带有\ n的条目，但我不知道如何删除它。

预期产出：

0010000110 0010000110 1
0010000110 1111010111 0.3
0010000110 0000110111 0.2
1111010111 0010000110 0.3
1111010111 1111010111 1
1111010111 0000110111 0.4
0000110111 0010000110 0.2
0000110111 1111010111 0.4
0000110111 0010000110 1

Answer 1

使用str.strip方法从字符串中删除空格。

如果您有多个序列并且想要对每个序列的相应元素执行某些操作，则可以使用zip 汇总这些元素。（注意zip返回一个迭代器，因此在示例list中用于显示其结果）

>>> list(zip('0010000110', '1111010111'))
[('0', '1'), ('0', '1'), ('1', '1'), ('0', '1'), ('0', '0'), ('0', '1'), ('0', '0'), ('1', '1'), ('1', '1'), ('0', '1')]
>>>

如果序列在容器中，则需要将其解压缩以便与zip一起使用：

>>> a
['0010000110', '1111010111']
>>> list(zip(*a))
[('0', '1'), ('0', '1'), ('1', '1'), ('0', '1'), ('0', '0'), ('0', '1'), ('0', '0'), ('1', '1'), ('1', '1'), ('0', '1')]
>>>

一旦将参数全部组合在一起，就可以轻松地与它们一起执行 stuff - 您可以将它们传递给函数，或者在您的情况下只需比较它们：

>>> [x == y for x,y in zip(*a)]
[False, False, True, False, True, False, True, True, True, False]
>>>

sum将使用迭代器/可迭代和计数所有True - True的值为1且{{1}值为零。

False

暂且不说：您可以将>>> sum(x == y for x,y in zip(*a)) 5 >>>的结果分配给名称并使用它。它可以使事情更容易阅读：

zip

如果你有一些事情，并且想要将每个事物与另一个事物进行比较，itertools可以轻松获取组合/排列。

>>> my_groups = zip(*a)
>>> my_groups
<zip object at 0x000000000308A9C8>
>>> sum(x == y for x,y in my_groups)
5
>>>

使用>>> import itertools >>> data ['0010000110', '1111010111', '0000110111'] >>> for permutation in itertools.permutations(data, 2): print(permutation) ('0010000110', '1111010111') ('0010000110', '0000110111') ('1111010111', '0010000110') ('1111010111', '0000110111') ('0000110111', '0010000110') ('0000110111', '1111010111') >>>，zip和sum，您可以撰写符合您需要的内容

itertools

我喜欢使用format strings来格式化我的打印语句：

>>> for combination in itertools.combinations_with_replacement(data, 2):
    print(combination, sum(x == y for x,y in zip(*combination)))


('0010000110', '0010000110') 10
('0010000110', '1111010111') 5
('0010000110', '0000110111') 6
('1111010111', '1111010111') 10
('1111010111', '0000110111') 5
('0000110111', '0000110111') 10
>>> 

>>> for a,b in itertools.combinations_with_replacement(data, 2):
    total = sum(x == y for x,y in zip(a, b))
    ratio = total / len(a)
    print(a, b, total, ratio)


0010000110 0010000110 10 1.0
0010000110 1111010111 5 0.5
0010000110 0000110111 6 0.6
1111010111 1111010111 10 1.0
1111010111 0000110111 5 0.5
0000110111 0000110111 10 1.0
>>>

我一直在使用list comprehensions and generator expressions这是一种编写>>> s = 'combination: {} {}\ttotal: {}\tratio: {}' >>> for a,b in itertools.combinations_with_replacement(data, 2): total = sum(x == y for x,y in zip(a, b)) ratio = total / len(a) print(s.format(a, b, total, ratio)) combination: 0010000110 0010000110 total: 10 ratio: 1.0 combination: 0010000110 1111010111 total: 5 ratio: 0.5 combination: 0010000110 0000110111 total: 6 ratio: 0.6 combination: 1111010111 1111010111 total: 10 ratio: 1.0 combination: 1111010111 0000110111 total: 5 ratio: 0.5 combination: 0000110111 0000110111 total: 10 ratio: 1.0 >>>循环的简洁方法 - 许多人在习惯它们后更喜欢它们（只要它们不太复杂）：

for

这可以用简写格式写成列表理解：

>>> data = [1,2,3]
>>> for x in data:
    print(x+2)

3
4
5

Answer 2

在继续操作之前，请确保两个字符串不相等。以下是您的问题的基本解决方案，以获得预期的输出：

f = open("Vector.txt", 'r')

l1 = [s.strip('\n') for s in f]
l2 = [s for s in l1]

f.close()

for a in l1:
    for b in l2:
        result = 0
        if (a == b):
            result = 1
        else:
            counter = 0
            for i in range(len(a)):
                if (int(a[i]) == int(b[i]) == 1):
                    counter += 1
            result = counter / len(a)
        print(a, b, result)

这适用于Python 3，结果如下：

0010000110 0010000110 1
0010000110 1111010111 0.3
0010000110 0000110111 0.2
1111010111 0010000110 0.3
1111010111 1111010111 1
1111010111 0000110111 0.4
0000110111 0010000110 0.2
0000110111 1111010111 0.4
0000110111 0000110111 1

编辑：您没有义务使用两个列表。你可以使用l1列表并迭代两次。如果要使用索引，可以使用以下命令来避免迭代并使用索引在列表中移动：

for a in range(0, len(l)):
   for b in range(0, len(l)):

如果要访问带索引的一个字符串元素，则可以执行以下操作：

for i in range(len(l[a]):
    if (int(l[a][i]) == int(l[b][i]) == 1):
        counter += 1

最后的指示将是：

print((a + 1), (b + 1), result)

要摆脱恼人的字符串流程，您可以访问this page

编辑：

为了回答评论中提到的效率问题，这里是一个涉及线程和复杂性较低的解决方案，而我们之前遇到的是纯二次复杂度。此解决方案假定文件中包含的所有字符串具有相同的长度，并且文件不会相互比较。如果不是这种情况，我相信您将能够找到这种基本方法的解决方案。

然后将每个比较存储到名为 sourcefile_compared.txt 的文件中，并且行中的每个单词都以逗号分隔。因为我们使用文件并启动多个线程，所以算法会集中使用异常。因为我不了解您的服务器，我建议您在自己的计算机上尝试这个并自行设置文件路径。

如果你想要接近线性复杂度的东西，你必须做出选择，因为你实际上想要对每个字符串相互计算。

import os
import threading


class ListComparator(threading.Thread):

    def __init__(self, file):

        threading.Thread.__init__(self)
        self.file = file
        self._stopevent = threading.Event()

    def run(self):
        name, extension = os.path.splitext(self.file)

        if (extension == '.txt'):

            print('comparing strings in file ' + name)

            try :
                f = open(file, 'r')

                l = [s.strip('\n') for s in f]

                f.close()

            except:
                print('unable to open file' + file)
                l = None

            if (l != None):

                try :

                    target = open(name + '_compared.txt', 'w')

                except Exception as e:
                    print(e)
                    target = None

                if (target != None):
                    for i in range(0, len(l) - 1):
                        for j in range(i + 1, len(l)):
                            result = 0
                            counter = 0

                            for k in range(len(l[i])):
                                if (int(l[i][k]) == int(l[j][k]) == 1):
                                    counter += 1

                            result = counter / len(l[i])
                            s = l[i] + ', ' + l[j] + ', ' + str(result) + '\n'

                            target.write(s)

                    target.close()

                    print(name + ' compared')
                else:
                    print(name + ' not compared')

        def stop(self):
            self._stopevent.set()


current_dir = os.getcwd()

for subdir, dirs, files in os.walk(current_dir):

    for file in files:

        try :
            comp = ListComparator(file)
            comp.start()

        except Exception as e:
            print(e)

以下是从控制台输出的输出：

comparing strings in file v
comparing strings in file Vector
Vector compared
v compared

以下是写入vector_compared.txt的数据：

0010000110, 1111010111, 0.3
0010000110, 0000110111, 0.2
1111010111, 0000110111, 0.4

比较python中的列表

2 个答案: