如何获取值以及具有该特定值的行数,该值是从熊猫数据帧上的多种条件得出的?

时间:2018-10-31 13:54:23

标签: python pandas dataframe

我有熊猫数据框

Id  drove   swimmed walked  winPerc
0   247.3   1050    782.4   1
1   37.65   1072    119.6   0.04
2   93.73   1404    3248    1
3   95.88   1069    21.49   0.1146
4   0       1034    640.8   0
5   128.1   1000    1016    0.9368

average 100.4433333 1104.833333 971.3816667 
Min     0           1000        21.49   
max     247.3       1404        3248`

winPerc = 1表示玩家赢得了第一名,同样,winPerc = 0则告诉我们玩家排名倒数

print("The person who ends up winning the match usually drives {:.2f} , swims {:.2f} meters, has a walked {} meters".format(df.set_index('drove')['winPerc'].idxmax(),df.set_index('swimmed')['winPerc'].idxmax(),df.set_index('walked')['winPerc'].idxmax()))

为此,我得到:-

  

IndexError:元组索引超出范围

我想要的是您在上面的数据框中看到的,ID为0和2的行的winPerc = 1我应该得到如下响应: The person who ends up winning the match usually drives 170.52 , swims 1227 meters, has a walked 2015.2 meters并且,如果有多个记录的winPerc = 1,那么我应该相应地获取值

也可能有一些球员没有开车(开车= 0),

赢得了比赛(winPerc = 1)

print("{} number of confident Players won without driving".format(len(df['drove'].min()['winPerc'].idxmax())))

为此,我收到此错误:-

  

IndexError:标量变量的索引无效。

如果万一没有一行具有min()或max()或mean()列值的行,那么我应该采用接近该值的行特殊情况。

在此先感谢您是否需要解释更多。 :)

1 个答案:

答案 0 :(得分:0)

我在不做任何更改的情况下复制了第一张照片,对我来说效果很好:

The person who ends up winning the match usually drives 247.30 , swims 1050.00 meters, has a walked 782.4 meters

当您使用.format()并获得IndexError: tuple out of range时,意味着您使用的变量太少了。


对于第二个问题,您需要过滤DataFrame。这可以通过不同的方式完成,使用布尔掩码是一种常见的方式。

>> drove_is_0 = df["drove"] == df['drove'].min()
>> is_winner =  df['winPerc'] == df['winPerc'].idxmax()

然后将过滤器应用于您的DataFrame

>> filtered = df[drove_is_0 & is_winner]

最后打印:

>> print("{} number of confident Players won without driving".format(len(filtered)))
1 number of confident Players won without driving

OP已澄清,第一个问题不是关于引发的IndexError,而是关于过滤。他们想过滤值为df的列winPerc上的1,然后为不同的列计算mean的值。我将使用如上所示的布尔掩码来保持一致性:

>> is_winner = df["winPerc"] == 1

>> mean_driven_winner = df[is_winner]["drove"].mean()
>> mean_swimmed_winner = df[is_winner]["swimmed"].mean()
>> mean_walked_winner = df[is_winner]["walked"].mean()

>> print("The person who ends up winning the match usually drives {:.2f} , swims {:.2f} meters, has a walked {} meters".format(
    mean_driven_winner, mean_swimmed_winner, mean_walked_winner)
)

The person who ends up winning the match usually drives 170.52 , swims 1227.00 meters, has a walked 2015.2 meters