我有一个大熊猫数据框,我计划按照名称',' driverRef','轮胎'并仅过滤一列中具有相似值的组。
在该组中,所有行在该列中具有相同的值。
类似地定义为值之间的差异最多为3的范围。例如。如果列中的唯一数字是5,10,12,13,则只保留10,12,13组。
Select id, someday, somevalue, (select sum(somevalue)
from testtable as t2
where t2.id = t1.id
and t2.someday <= t1.someday) as runningtotal
from testtable as t1
order by id,someday;
预期产出:
name driverRef stint tyre lap stint length
0 Australian Grand Prix ham 1.0 Super soft 1 5
1 Australian Grand Prix vettel 1.0 Super soft 2 10
2 Australian Grand Prix bottas 1.0 Super soft 3 10
3 Australian Grand Prix alonso 2.0 Super soft 20 13
4 Australian Grand Prix alonso 2.0 Super soft 21 13
5 Australian Grand Prix alonso 2.0 Super soft 22 13
6 Bahrain Grand Prix ham 1.0 Super soft 1 5
7 Bahrain Grand Prix vettel 1.0 Super soft 2 6
8 Bahrain Grand Prix bottas 1.0 Super soft 3 6
9 Bahrain Grand Prix alonso 2.0 Super soft 20 13
10 Bahrain Grand Prix alonso 2.0 Super soft 21 13
11 Bahrain Grand Prix alonso 2.0 Super soft 22 13
答案 0 :(得分:0)
我相信你需要:
s = df.groupby(['name','tyre'])['stint length'].transform(lambda x: x.mode().iat[0])
#alternative
#s=df.groupby(['name','tyre'])['stint length'].transform(lambda x:x.value_counts().index[0])
df = df[df['stint length'] == s]
print (df)
name driverRef stint tyre lap stint length
3 Australian Grand Prix alonso 2.0 Super soft 20 13
4 Australian Grand Prix alonso 2.0 Super soft 21 13
5 Australian Grand Prix alonso 2.0 Super soft 22 13
9 Bahrain Grand Prix alonso 2.0 Super soft 20 13
10 Bahrain Grand Prix alonso 2.0 Super soft 21 13
11 Bahrain Grand Prix alonso 2.0 Super soft 22 13