DF的快照如下:
idf=pd.DataFrame({'p1': {549: 'Staffordshire_bullterrier', 1374: 'kelpie', 641: 'Samoyed'},
'p1_conf': {549: 0.6892590000000001, 1374: 0.519047, 641: 0.362596},
'p1_dog': {549: True, 1374: True, 641: True},
'p2': {549: 'Norwegian_elkhound', 1374: 'German_shepherd', 641: 'Eskimo_dog'},
'p2_conf': {549: 0.026121, 1374: 0.296069, 641: 0.245395},
'p2_dog': {549: True, 1374: True, 641: True},
'p3': {549: 'American_Staffordshire_terrier',
1374: 'dingo',
641: 'Siberian_husky'},
'p3_conf': {549: 0.0230747, 1374: 0.0610053, 641: 0.108232},
'p3_dog': {549: True, 1374: False, 641: True},
'breed': {549: 'Staffordshire_bullterrier', 1374: 'kelpie', 641: 'Samoyed'}})
我的目标是归还最自信的犬种。例如:如果p1_dog为true,则应返回p1。如果不是真的,第二个最有信心的人就是p2_dog,那么应该返回p2,依此类推。当然,我可以这样写:
idf['breed']=idf.query("p1_dog==1").p1
idf['breed']=idf['breed'].fillna(idf.query("p1_dog==0 and p2_dog==1").p2)
idf['breed']=idf['breed'].fillna(idf.query("p1_dog==0 and p2_dog==0 and p3_dog==1").p3)
期望的结果是最后一列“品种”,我上面的代码可以达到目的。 但我认为它是重复的而不是干燥的。如果我有数百个预测怎么办?最好的解决方案是什么?谢谢你!
答案 0 :(得分:1)
在我看来,public class Example2 {
int x;
int y;
Example2(int x, int y) {
this.x = x;
this.y = y;
}
public int sum() {
return x + y;
}
public int mult() {
return x * y;
}
}
的完美案例
np.select
np.select
有一个通用的解决方案,特别是当您有许多要比较的列时。它利用import pandas as pd
import numpy as np
condlist = [df["p1_dog"]==1,
((df["p1_dog"]==0) & (df["p2_dog"]==1)),
((df["p1_dog"]==0) & (df["p2_dog"]==0) & (df["p3_dog"]==1))]
choicelist = [df["p1"], df["p2"], df["p3"]]
df["breed"] = np.select(condlist, choicelist)
和这个solution
np.argmax
我们首先在每一行中选择第一个true
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"p1":['Staffordshire_bullterrier', 'Samoyed', 'kelpie','dingo'],
"p1_dog":[True, False, False, True],
"p2": ['Norwegian_elkhound', 'Eskimo_dog', 'German_shepherd', 'kelpie'],
"p2_dog":[False, True, False, True],
"p3":['American_Staffordshire_terrier', 'Siberian_husky', 'dingo','Samoyed'],
"p3_dog":[False, True, True, True]
})
然后我们提取具有品种名称的矩阵
sel = df[["p1_dog", "p2_dog", "p3_dog"]].values.argmax(1)
我们终于使用您的逻辑定义了品种
mat = df[["p1", "p2", "p3"]].values
答案 1 :(得分:0)
对于数据框中的每一行,请找到具有最大值的列的名称
x = idf[['p1_conf','p2_conf','p3_conf']].idxmax(axis=1)
从x获取列名之前的列数
breed_col = [idf.columns.get_loc(x.iloc[i])-1 for i in range(0, 3)]
提取与行和列相对应的单元格的值:i和breed_col [i]
breed2 = []
for i in range(0,3):
breed2.append(idf.iloc[i,breed_col[i]])
breed2_df = pd.DataFrame(breed2, columns = ['breed2'])
重置索引以启用数据帧的合并
idf.reset_index(drop=True, inplace=True)
pd.concat([idf, breed2_df], axis=1)