Question

这是我的数据框

import pandas as pd

df = pd.DataFrame({
    "Gender": ["M", "F", "M", "M", "M",  "F", "F", "F", "F", "F", "F"],
    "Work-code": ["N1", "N3", "N1", "N1", "X15", "N3", "N3", "N3", "N3", "N1", "N3"],
    "Accident-type-code": ["1.1","1.2", "1.1","1.3","1.5","1.3","1.1","1.1","1.1", "1.1", "1.3"]
})

要分析这些数据，我正在使用groupby：

data = df.groupby(["Gender", "Work-code"])["Accident-type-code"].value_counts()

这就是我得到的：

Gender  Work-code  Accident-type-code
F       N1         1.1                   1
        N3         1.1                   3
                   1.3                   2
                   1.2                   1
M       N1         1.1                   2
                   1.3                   1
        X15        1.5                   1

我需要的只是每个内部组（给定外部组的最频繁组）的第一行，例如：

Gender  Work-code  Accident-type-code
F       N1         1.1                   1
        N3         1.1                   3
M       N1         1.1                   2
        X15        1.5                   1

事实上，我这样做是因为我想进行双变量频率分布，但是我不知道python中的任何函数或库都可以这样做。

Answer 1

您需要在“分组依据”部分中进行一些更改。

data = df.groupby(["Gender", "Work-code"])["Accident-type-code"].value_counts().reset_index(name="counts")

data.head(1)

现在您有了一张普通的桌子，您可以使用循环很容易地找到它。

Answer 2

好的，所以您可以尝试一下。 首先，groupby reset_index：

data_raw = df.groupby(["Gender", "Work-code"])["Accident-type-code"].value_counts().reset_index(name="counts")

然后

data_raw.groupby(['Gender','Work-code'],as_index=True).first()

我的输出：

That's the snap of output

在每个分级熊猫系列中获得第一行

2 个答案: