Question

我目前在数据框中有这样的一行：

     bigrams                     other1     other2
[(me, you), (stack, overflow)] .................
[(me, you)]                    .................

我正在尝试将我的前十个二元组放入列表中，以便出于比较原因可以使用它。我已经尝试将我的前10个二元组复制并粘贴到这样的列表中：

list = ['(me, you)',  .....]

这不起作用。有没有人有什么建议？谢谢。

Answer 1

您可以使用itertools.chain（使“ bigrams”列中的列表列表变平），然后使用pd.value_counts。

df = pd.DataFrame({'bigrams': [['(a, b)', '(c, d)'], ['(a, b)'], ['(a, b)', '(e, f)']]})
df
            bigrams
0  [(a, b), (c, d)]
1          [(a, b)]
2  [(a, b), (e, f)]

pd.__version__
# '0.24.1'

from itertools import chain

n = 2 # Find the top N
pd.value_counts(list(chain.from_iterable(df['bigrams']))).index[:n].tolist()
# ['(a, b)', '(e, f)']

Answer 2

让我们使用Counter

from collections import Counter

list(dict(Counter(df.bigrams.sum()).most_common(10)).keys())

如下所述，使用itertools.chain替换sum

from itertools import chain
l=list(chain.from_iterable(df['bigrams']))
list(dict(Counter(l).most_common(10)).keys())

如何将二元动物列出大熊猫

2 个答案: