具有其他列条件的新列

时间:2018-04-16 14:48:18

标签: python pandas

 > df = pd.DataFrame({"A": ["2002-01-12","2002-01-12","2002-01-12","2002-01-13","2002-01-13","2002-01-13","2002-01-16","2002-01-16","2002-01-16"], "B": ["12:00:00", "13:00:00", "14:00:00","11:00:00", "12:00:00", "13:00:00", "10:00:00", "11:00:00", "12:00:00"], "C": [ 3,19, 15, 6, 1, 5, 3, 12, 8]})

           A         B   C
0 2002-01-12  12:00:00   3
1 2002-01-12  13:00:00  19
2 2002-01-12  14:00:00  15
3 2002-01-13  11:00:00   6
4 2002-01-13  12:00:00   1
5 2002-01-13  13:00:00   5
6 2002-01-16  10:00:00   3
7 2002-01-16  11:00:00  12
8 2002-01-16  12:00:00   8

我想为每个df['D']群组创建一个新的df['E']A以及下一个条件:

  • df['D']:在C
  • 时,获取前一天的A值(尊重B == 12:00:00组)
  • df['E']:取C前一天的值{尊敬A组。)

输出应为:

           A         B   C    D     E
0 2002-01-12  12:00:00   3    0     0
1 2002-01-12  13:00:00  19    0     0
2 2002-01-12  14:00:00  15    0     0
3 2002-01-13  11:00:00   6    3  12.3
4 2002-01-13  12:00:00   1    3  12.3
5 2002-01-13  13:00:00   5    3  12.3
6 2002-01-16  10:00:00   3    1   4.0
7 2002-01-16  11:00:00  12    1   4.0
8 2002-01-16  12:00:00   8    1   4.0

2 个答案:

答案 0 :(得分:3)

您可以为每一天帮助Series创建,前一天将shiftmap添加到新列,最后将NaN替换为fillna

a = df[df['B'].eq('12:00:00')].set_index('A')['C'].shift(1)
b = df.groupby('A')['C'].mean().shift(1)

df['D'] = df['A'].map(a)
df['E'] = df['A'].map(b)
df[['D','E']] = df[['D','E']].fillna(0)
print (df)
           A         B   C    D          E
0 2002-01-12  12:00:00   3  0.0   0.000000
1 2002-01-12  13:00:00  19  0.0   0.000000
2 2002-01-12  14:00:00  15  0.0   0.000000
3 2002-01-13  11:00:00   6  3.0  12.333333
4 2002-01-13  12:00:00   1  3.0  12.333333
5 2002-01-13  13:00:00   5  3.0  12.333333
6 2002-01-16  10:00:00   3  1.0   4.000000
7 2002-01-16  11:00:00  12  1.0   4.000000
8 2002-01-16  12:00:00   8  1.0   4.000000

答案 1 :(得分:0)

我做了一个更强大的,但有效:

df['A'] = pd.to_datetime(df['A'])

df['D'] = df['A'].apply(lambda x: df[(df['A']==(x + pd.DateOffset(-1)))&(df['B']=='12:00:00')]['C'].mean()).fillna(0)
df['E'] = df['A'].apply(lambda x: df[df['A']==(x + pd.DateOffset(-1))]['C'].mean()).fillna(0)