Question

我想知道是否有人可以帮我解决一个小问题。我目前有一个包含大量行的大量数据集，我想创建一个较小的数据帧，只从较大的数据集中拉出2列，在这个实例中，每个名称出现的次数为“Occurrence”

以下代码就是我正在使用的

 df1 = (Dec16.groupby(["BNF Chapter", "Name"]).size().reset_index(name="Occurrence"))
df1

它绘制了这个

BNF Chapter       Name                                       Occurrence 
1                 Aluminium hydroxide                           2
1                 Aluminium hydroxide + Magnesium trisilicate   2
1                 Alverine                                      702
.......
21              Polihexanide                                     2
21              Potassium hydroxide                              32
21              Sesame oil                                       22
21              Sodium chloride                                  222

我想得到的是某一章最常出现的十大名称，因为数据集太大了。

例如，只拉动的数据帧第1章中最常见的十大名称

我将如何做到这一点？

非常感谢!!!

Answer 1

让我们用随机变量生成器做一个小例子。

import pandas as pd
import numpy as np
# we create random integers in 3 columns
df=pd.DataFrame(np.random.randint(0,10,(1000,3)), columns=["A","B","C"])
# we want to count the repetitions of C given A and B
result = df.groupby(["A","B"]).count()
# the result will print something like this, counting the repetitions of A and B
#       C
# A B    
# 0 0  11
#   1  10
#   2   5
#   3  12
#   4   9
#   5  12
#   6   7
#   7   8

根据某些列获取数据集中名称最多的名称

1 个答案: