Question

我们说我有这种格式的数据（假设制表符分隔）

1   10,11,15
2   12
3   12,11
4   10,11

如何遍历列表并计算第二列中最受欢迎的对象对？假设第二列可以包含无限数量的项目。

理想的输出会返回类似

的内容

pairs count
10,11 (2)
10,15 (1)
11,15 (1)
11,12 (1)

Answer 1

这两个都假设您可以将输入输入到列表列表中：

如果你有Python 2.7，请与Counter结合使用itertools：

>>> from collections import Counter
>>> from itertools import combinations
>>> l = [[10, 11, 15], [12], [12, 11], [10, 11]]
>>> c = Counter(x for sub in l for x in combinations(sub, 2))
>>> for k, v in c.iteritems():
...   print k, v
...
(10, 15) 1
(11, 15) 1
(10, 11) 2
(12, 11) 1

如果你有Python＆lt; 2.6，您可以将defaultdict与itertools结合使用（我敢肯定，其中一位专家会提供更清洁的解决方案。）

In [1]: from collections import defaultdict

In [2]: from itertools import combinations

In [3]: l = [[10, 11, 15], [12], [12, 11], [10, 11]]

In [4]: counts = defaultdict(int)

In [5]: for x in l:
   ...:     for item in combinations(x, 2):
   ...:         counts[item] += 1
   ...:
   ...:

In [6]: for k, v in counts.iteritems():
   ...:     print k, v
   ...:
   ...:
(10, 15) 1
(11, 15) 1
(10, 11) 2
(12, 11) 1

Answer 2

In [7]: with open("data1.txt") as f:
        lis=[map(int,x.split(",")) for x in f]
   ...:     

In [8]: Counter(chain(*[combinations(x,2) for x in lis]))
Out[8]: Counter({(10, 11): 2, (10, 15): 1, (11, 15): 1, (12, 11): 1})

Answer 3

您可以使用combinations和Counter。

from itertools import combinations
import collections

newinput = []

# Removes the tabs
for line in oldinput:
    newinput.append(line.partition("\t")[2])

# set up the counter
c = collections.Counter()

for line in newinput:
    # Split by comma
    a = line.split(',')
    # make into integers from string
    a = map(int, a)
    # add to counter
    c.update(combinations(a, 2))

然后，您最终获得了Counter，其中包含您的所有计数： `（10,15）：1）等。

如何使用python找到对？

3 个答案: