我的数据类似于下表:
Type Size Color Color2 cat small white white cat small white white cat large brown #N/A cat large black #N/A dog large white white dog small black black cat small white white dog small brown brown dog small brown brown dog small brown brown cat large brown #N/A cat large brown #N/A dog large #N/A brown dog large white white dog large black black cat large white #N/A dog large brown brown cat small white white cat small white white dog large brown brown dog large white white dog large #N/A brown dog small black black cat small white white dog small white white dog small white white cat small white white dog small black black dog small black black dog large brown brown dog large brown brown cat large black #N/A cat small white white
目标是使用以类型和大小为条件的相应列的模式填充Color和Color2中的缺失值。
下面的代码段对于“颜色”列效果很好,而忽略了“颜色”列中缺少的值
df.groupby(['Type','Size'])['Color'].transform(lambda x: x.mode()[0])
但是,我的实际数据类似于正在发生的Color2列。在此列中,所有与cat large对应的Color2值都丢失了。因此,当我应用下面的代码片段时,我得到了超出范围的错误索引。
df.groupby(['Type','Size'])['Color2'].transform(lambda x: x.mode()[0])
如果特定分组仅具有缺失值,我希望能够返回NaN /#N / A,但是如果分组中存在非缺失值,则返回模式,同时忽略缺失值。
答案 0 :(得分:1)
仅在命令中使用[0]
而不是.get(0,'NaN/#N/A')
。如果找不到密钥,它将选择默认值'NaN/#N/A'
。
df['new_color'] = df.groupby(['Type','Size'])['Color2'] \
.transform(lambda x: x.mode().get(0,'NaN/#N/A'))
Out[1246]:
Type Size Color Color2 new_color
0 cat small white white white
1 cat small white white white
2 cat large brown NaN NaN/#N/A
3 cat large black NaN NaN/#N/A
4 dog large white white brown
5 dog small black black black
6 cat small white white white
7 dog small brown brown black
8 dog small brown brown black
9 dog small brown brown black
10 cat large brown NaN NaN/#N/A
11 cat large brown NaN NaN/#N/A
12 dog large NaN brown brown
13 dog large white white brown
14 dog large black black brown
15 cat large white NaN NaN/#N/A
16 dog large brown brown brown
17 cat small white white white
18 cat small white white white
19 dog large brown brown brown
20 dog large white white brown
21 dog large NaN brown brown
22 dog small black black black
23 cat small white white white
24 dog small white white black
25 dog small white white black
26 cat small white white white
27 dog small black black black
28 dog small black black black
29 dog large brown brown brown
30 dog large brown brown brown
31 cat large black NaN NaN/#N/A
32 cat small white white white
答案 1 :(得分:0)
使用value_counts
df.fillna(df.groupby(['Type','Size']).transform(lambda x : x.value_counts(dropna=False).index[0]),inplace=True)
或者在0.24中,您也可以在dropna=False
中传递mode
df.groupby(['Type','Size'])['Color2'].transform(lambda x: x.mode(dropna=False)[0])