鉴于下面的列表,我想用“颜色”列的模式(以“类型”和“大小”为条件)填充“颜色猜测”列,并忽略NULL,#N / A等。
例如,小型猫最常见的颜色是什么,中型犬最常见的颜色是什么,等等。
Type Size Color Color Guess Cat small brown Dog small black Dog large black Cat medium white Cat medium #N/A Dog large brown Cat large white Cat large #N/A Dog large brown Dog medium #N/A Cat small #N/A Dog small white Dog small black Dog small brown Dog medium white Dog medium #N/A Cat large brown Dog small white Dog large #N/A
答案 0 :(得分:5)
正如BarMar在评论中所述,我们可以在链接的答案中使用pd.Series.mode
。这里唯一的技巧是,我们必须使用groupby.transform
,因为我们希望数据恢复为与数据框相同的形状:
df['Color Guess'] = df.groupby(['Type', 'Size'])['Color'].transform(lambda x: pd.Series.mode(x)[0])
Type Size Color Color Guess
0 Cat small brown brown
1 Dog small black black
2 Dog large black brown
3 Cat medium white white
4 Cat medium NaN white
5 Dog large brown brown
6 Cat large white brown
7 Cat large NaN brown
8 Dog large brown brown
9 Dog medium NaN white
10 Cat small NaN brown
11 Dog small white black
12 Dog small black black
13 Dog small brown black
14 Dog medium white white
15 Dog medium NaN white
16 Cat large brown brown
17 Dog small white black
18 Dog large NaN brown