根据多个条件替换组内的值

时间:2018-05-18 19:50:29

标签: python pandas dataframe

我的问题与此one有关,但我仍然没有看到如何将问题的答案应用到我的问题中。我有一个像这样的DataFrame:

df = pd.DataFrame({
    'date': ['2001-01-01', '2001-02-01', '2001-03-01', '2001-04-01', '2001-02-01', '2001-03-01', '2001-04-01'],
    'cohort': ['2001-01-01', '2001-01-01', '2001-01-01', '2001-01-01', '2001-02-01', '2001-02-01', '2001-02-01'],
    'val': [100, 101, 102, 101, 200, 201, 201]
})

df
    date        cohort      val
0   2001-01-01  2001-01-01  100
1   2001-02-01  2001-01-01  101
2   2001-03-01  2001-01-01  102
3   2001-04-01  2001-01-01  101
4   2001-02-01  2001-02-01  200
5   2001-03-01  2001-02-01  201
6   2001-04-01  2001-02-01  201

对每个cohort进行分组,我想将val的值替换为val的最大值,但仅适用于date小于date的观察值{1}}与val的最大值相关联。因此,行0,1和4将更改为如下所示:

df #This is what I want my final df to look like 
    date        cohort      val
0   2001-01-01  2001-01-01  102
1   2001-02-01  2001-01-01  102
2   2001-03-01  2001-01-01  102
3   2001-04-01  2001-01-01  101
4   2001-02-01  2001-02-01  201
5   2001-03-01  2001-02-01  201
6   2001-04-01  2001-02-01  201

如果没有很多循环,我怎么能这样做?

1 个答案:

答案 0 :(得分:1)

  1. 确定val
  2. cohort PER GROUP的最大值
  3. 确定与val
  4. 相关联的最长日期
  5. 使用np.where
  6. 执行矢量化比较和替换

    v = df.groupby('cohort').val.transform('max')
    df['val'] = np.where(
        df.date <= df.set_index('cohort').val.idxmax(), v, df.val
    )
    

    df
        date        cohort      val
    0   2001-01-01  2001-01-01  102
    1   2001-02-01  2001-01-01  102
    2   2001-03-01  2001-01-01  102
    3   2001-04-01  2001-01-01  101
    4   2001-02-01  2001-02-01  201
    5   2001-03-01  2001-02-01  201
    6   2001-04-01  2001-02-01  201