Question

    x    animal
0   5    [dog, cat]
1   6    [dog]
2   8    [elephant]

我有这样的数据帧。我怎样才能找到所有列中包含的最常见的动物。

方法value_counts（）将list视为一个元素，我无法使用它。

Answer 1

这些方面的东西？

import pandas as pd
df = pd.DataFrame({'x' : [5,6,8], 'animal' : [['dog', 'cat'], ['elephant'], ['dog']]})
x = sum(df.animal, [])
#x
#Out[15]: ['dog', 'cat', 'elephant', 'dog']

from collections import Counter
c = Counter(x)
c.most_common(1)
#Out[17]: [('dog', 2)]

Answer 2

也许退一步重新定义您的数据结构？如果您的数据框架是“平坦的”，那么Pandas就更适合了。

而不是：

    x    animal
0   5    [dog, cat]
1   6    [dog]
2   8    [elephant]

执行：

    x    animal
0   5    dog
1   5    cat
2   6    dog
3   8    elephant

现在，您可以轻松计算len(df[df['animal'] == 'dog'])以及许多其他熊猫的事情！

要展平您的数据框，请参考以下答案： Flatten a column with value of type list while duplicating the other column's value accordingly in Pandas

熊猫：在列表列中查找最常用的值

2 个答案: