这是我的数据框
import pandas as pd
df = pd.DataFrame({
"Gender": ["M", "F", "M", "M", "M", "F", "F", "F", "F", "F", "F"],
"Work-code": ["N1", "N3", "N1", "N1", "X15", "N3", "N3", "N3", "N3", "N1", "N3"],
"Accident-type-code": ["1.1","1.2", "1.1","1.3","1.5","1.3","1.1","1.1","1.1", "1.1", "1.3"]
})
要分析这些数据,我正在使用groupby:
data = df.groupby(["Gender", "Work-code"])["Accident-type-code"].value_counts()
这就是我得到的:
Gender Work-code Accident-type-code
F N1 1.1 1
N3 1.1 3
1.3 2
1.2 1
M N1 1.1 2
1.3 1
X15 1.5 1
我需要的只是每个内部组(给定外部组的最频繁组)的第一行,例如:
Gender Work-code Accident-type-code
F N1 1.1 1
N3 1.1 3
M N1 1.1 2
X15 1.5 1
事实上,我这样做是因为我想进行双变量频率分布,但是我不知道python中的任何函数或库都可以这样做。
答案 0 :(得分:0)
您需要在“分组依据”部分中进行一些更改。
data = df.groupby(["Gender", "Work-code"])["Accident-type-code"].value_counts().reset_index(name="counts")
data.head(1)
现在您有了一张普通的桌子,您可以使用循环很容易地找到它。
答案 1 :(得分:0)
好的,所以您可以尝试一下。 首先,groupby reset_index:
data_raw = df.groupby(["Gender", "Work-code"])["Accident-type-code"].value_counts().reset_index(name="counts")
然后
data_raw.groupby(['Gender','Work-code'],as_index=True).first()
我的输出: