熊猫:找到最大值,何时和如果条件

时间:2016-04-22 11:36:08

标签: python pandas

我有一个数据框,df:

id  volume  saturation  time_delay_normalised   speed   BPR_free_speed  BPR_speed   Volume  time_normalised
27WESTBOUND 580 0.351515152 57  6.54248366  17.88   15.91366177 580 1.59375
27WESTBOUND 588 0.356363636 100 5.107142857 17.88   15.86519847 588 2.041666667
27WESTBOUND 475 0.287878788 64  6.25625 17.88   16.51161331 475 0.666666667
27EASTBOUND 401 0.243030303 59  6.458064516 17.88   16.88283672 401 1.0914583333
27EASTBOUND 438 0.265454545 46  7.049295775 17.88   16.70300418 438 1.479166667
27EASTBOUND 467 0.283030303 58  6.5 17.88   16.55392848 467 0.9604166667

我希望创建一个新列free_capacity,并在Volume小于或等于1.1时将其设置为每ID的{​​{1}}的最大值

不考虑time_normalised条件,我可以这样做:

time_normalised

如何添加何时df['free_capacity'] = df.groupby('id')["Volume"].transform('max') 条件?

修改

@jezrael建议如下:

time_normalised <= 1.1

给出了:

df.loc[df['time_normalised'] <= 1.1, 'free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')

但是,我仍然希望归因于id volume saturation time_delay_normalised speed \ 27WESTBOUND 580 0.351515 57 6.542484 27WESTBOUND 588 0.356364 100 5.107143 27WESTBOUND 475 0.287879 64 6.256250 27EASTBOUND 401 0.243030 59 6.458065 27EASTBOUND 438 0.265455 46 7.049296 27EASTBOUND 467 0.283030 58 6.500000 BPR_free_speed BPR_speed Volume time_normalised free_capacity 17.88 15.913662 580 1.593750 NaN 17.88 15.865198 588 2.041667 NaN 17.88 16.511613 475 0.666667 475.0 17.88 16.882837 401 1.091458 467.0 17.88 16.703004 438 1.479167 NaN 17.88 16.553928 467 0.960417 467.0

标识的free_capacity的值

因此,我试过了:

id

然而,这仍然导致NaN值。 1.1 time_normalised条件用于查找值,而不是限制其应用。

期望的结果:

df['free_capacity'] = df.loc[df['time_normalised'] <= 1.1].groupby('id')["Volume"].transform('max')

3 个答案:

答案 0 :(得分:4)

您可以使用where按条件进行过滤,然后使用Series df['id'] df['free_capacity'] = df['Volume'].where(df['time_normalised'] <= 1.1) .groupby(df['id']) .transform('max') print df id volume saturation time_delay_normalised speed \ 0 27WESTBOUND 580 0.351515 57 6.542484 1 27WESTBOUND 588 0.356364 100 5.107143 2 27WESTBOUND 475 0.287879 64 6.256250 3 27EASTBOUND 401 0.243030 59 6.458065 4 27EASTBOUND 438 0.265455 46 7.049296 5 27EASTBOUND 467 0.283030 58 6.500000 BPR_free_speed BPR_speed Volume time_normalised free_capacity 0 17.88 15.913662 580 1.593750 475.0 1 17.88 15.865198 588 2.041667 475.0 2 17.88 16.511613 475 0.666667 475.0 3 17.88 16.882837 401 1.091458 467.0 4 17.88 16.703004 438 1.479167 467.0 5 17.88 16.553928 467 0.960417 467.0 groupby进行过滤:

Volume1

如果按照您的条件使用transform创建新列df['Volume1'] = df['Volume'].where(df['time_normalised'] <= 1.1) print df id volume saturation time_delay_normalised speed \ 0 27WESTBOUND 580 0.351515 57 6.542484 1 27WESTBOUND 588 0.356364 100 5.107143 2 27WESTBOUND 475 0.287879 64 6.256250 3 27EASTBOUND 401 0.243030 59 6.458065 4 27EASTBOUND 438 0.265455 46 7.049296 5 27EASTBOUND 467 0.283030 58 6.500000 BPR_free_speed BPR_speed Volume time_normalised Volume1 0 17.88 15.913662 580 1.593750 NaN 1 17.88 15.865198 588 2.041667 NaN 2 17.88 16.511613 475 0.666667 475.0 3 17.88 16.882837 401 1.091458 401.0 4 17.88 16.703004 438 1.479167 NaN 5 17.88 16.553928 467 0.960417 467.0 ,则相同:

Volume1

wheregroupby一起使用新列df['free_capacity'] = df.groupby('id')["Volume1"].transform('max') print df id volume saturation time_delay_normalised speed \ 0 27WESTBOUND 580 0.351515 57 6.542484 1 27WESTBOUND 588 0.356364 100 5.107143 2 27WESTBOUND 475 0.287879 64 6.256250 3 27EASTBOUND 401 0.243030 59 6.458065 4 27EASTBOUND 438 0.265455 46 7.049296 5 27EASTBOUND 467 0.283030 58 6.500000 BPR_free_speed BPR_speed Volume time_normalised Volume1 free_capacity 0 17.88 15.913662 580 1.593750 NaN 475.0 1 17.88 15.865198 588 2.041667 NaN 475.0 2 17.88 16.511613 475 0.666667 475.0 475.0 3 17.88 16.882837 401 1.091458 401.0 467.0 4 17.88 16.703004 438 1.479167 NaN 467.0 5 17.88 16.553928 467 0.960417 467.0 467.0

(course = cm.course)

答案 1 :(得分:1)

可以有几个答案,您也可以这样做:

df.set_index('id', inplace=True)
df['free_capacity'] = df.groupby(level=0).apply(lambda x: x.loc[x['time_normalised']<=1.1]['volume'].max())

这给出了以下内容:

             volume  saturation  time_delay_normalised     speed  \
id
27WESTBOUND     580    0.351515                     57  6.542484
27WESTBOUND     588    0.356364                    100  5.107143
27WESTBOUND     475    0.287879                     64  6.256250
27EASTBOUND     401    0.243030                     59  6.458065
27EASTBOUND     438    0.265455                     46  7.049296
27EASTBOUND     467    0.283030                     58  6.500000

             BPR_free_speed  BPR_speed  Volume  time_normalised    wrong_x    free_capacity
id
27WESTBOUND           17.88  15.913662     580         1.593750  588  475
27WESTBOUND           17.88  15.865198     588         2.041667  588  475
27WESTBOUND           17.88  16.511613     475         0.666667  588  475
27EASTBOUND           17.88  16.882837     401         1.091458  467  467
27EASTBOUND           17.88  16.703004     438         1.479167  467  467
27EASTBOUND           17.88  16.553928     467         0.960417  467  467

如果需要,可以按df.reset_index(inplace=True)重置索引 wrong_x列是错误的结果,没有条件

df['wrong_x']=B.groupby(level=0)['volume'].max()

这是你最初尝试过的。

答案 2 :(得分:1)

还要考虑groupby().apply()

def maxtime(row):
    row['free_capacity'] = row[row['time_normalised'] <= 1.1]['Volume'].max()
    return row

df = df.groupby('id').apply(maxtime)