问题:
如何找到num_2
>每天的第一次。 num_1
。每日groupby
条件基于第一个较高的值,如下例所示。
数据:
df = pd.DataFrame({
'num_1':[1,2,3,4,5,6,7,8,9,10,11,12],
'num_2':[1,2,10,5,5,6,7,8,100,101,102,15],
'dates':pd.date_range('1/1/2011', periods=12, freq='8h')})
df
dates num_1 num_2
0 2011-01-01 00:00:00 1 1
1 2011-01-01 08:00:00 2 2
2 2011-01-01 16:00:00 3 10
3 2011-01-02 00:00:00 4 5
4 2011-01-02 08:00:00 5 5
5 2011-01-02 16:00:00 6 6
6 2011-01-03 00:00:00 7 7
7 2011-01-03 08:00:00 8 8
8 2011-01-03 16:00:00 9 100
9 2011-01-04 00:00:00 10 101
10 2011-01-04 08:00:00 11 102
11 2011-01-04 16:00:00 12 15
我已强调此数据的条件为True
的次数:
期望的输出:
当条件为1
时显示True
的新列和0
时的False
答案 0 :(得分:2)
解决方案:
In [85]: df['result'] = \
...: df.dates.isin(
...: df.groupby(pd.Grouper(key='dates', freq='D'), as_index=False)
...: .apply(lambda x: x.loc[x.num_2 > x.num_1].head(1))['dates']).astype(int)
...:
In [86]: df
Out[86]:
dates num_1 num_2 result
0 2011-01-01 00:00:00 1 1 0
1 2011-01-01 08:00:00 2 2 0
2 2011-01-01 16:00:00 3 10 1
3 2011-01-02 00:00:00 4 5 1
4 2011-01-02 08:00:00 5 5 0
5 2011-01-02 16:00:00 6 6 0
6 2011-01-03 00:00:00 7 7 0
7 2011-01-03 08:00:00 8 8 0
8 2011-01-03 16:00:00 9 100 1
9 2011-01-04 00:00:00 10 101 1
10 2011-01-04 08:00:00 11 102 0
11 2011-01-04 16:00:00 12 15 0
说明:一步一步:
In [80]: df.groupby(pd.Grouper(key='dates', freq='D'), as_index=False) \
...: .apply(lambda x: x.loc[x.num_2 > x.num_1].head(1))
...:
Out[80]:
dates num_1 num_2 result
0 2 2011-01-01 16:00:00 3 10 1
1 3 2011-01-02 00:00:00 4 5 1
2 8 2011-01-03 16:00:00 9 100 1
3 9 2011-01-04 00:00:00 10 101 1
In [81]: df.groupby(pd.Grouper(key='dates', freq='D'), as_index=False) \
...: .apply(lambda x: x.loc[x.num_2 > x.num_1].head(1))['dates']
...:
Out[81]:
0 2 2011-01-01 16:00:00
1 3 2011-01-02 00:00:00
2 8 2011-01-03 16:00:00
3 9 2011-01-04 00:00:00
Name: dates, dtype: datetime64[ns]
In [82]: df.dates.isin(
...: df.groupby(pd.Grouper(key='dates', freq='D'), as_index=False)
...: .apply(lambda x: x.loc[x.num_2 > x.num_1].head(1))['dates'])
...:
Out[82]:
0 False
1 False
2 True
3 True
4 False
5 False
6 False
7 False
8 True
9 True
10 False
11 False
Name: dates, dtype: bool
In [83]: df.dates.isin(
...: df.groupby(pd.Grouper(key='dates', freq='D'), as_index=False)
...: .apply(lambda x: x.loc[x.num_2 > x.num_1].head(1))['dates']).astype(int)
...:
Out[83]:
0 0
1 0
2 1
3 1
4 0
5 0
6 0
7 0
8 1
9 1
10 0
11 0
Name: dates, dtype: int32
答案 1 :(得分:2)
您可以apply
lambda
比较条件并使用idxmax
返回首先出现此情况的索引标签,将这些行值分配给1:
In [36]:
# assign default value, this sets the dtype to int so we don't have to convert and fillna after the following line
df['result'] = 0
df.loc[df.groupby(df['dates'].dt.date).apply(lambda x: (x['num_2'] > x['num_1']).idxmax()),'result'] = 1
df
Out[36]:
dates num_1 num_2 result
0 2011-01-01 00:00:00 1 1 0
1 2011-01-01 08:00:00 2 2 0
2 2011-01-01 16:00:00 3 10 1
3 2011-01-02 00:00:00 4 5 1
4 2011-01-02 08:00:00 5 5 0
5 2011-01-02 16:00:00 6 6 0
6 2011-01-03 00:00:00 7 7 0
7 2011-01-03 08:00:00 8 8 0
8 2011-01-03 16:00:00 9 100 1
9 2011-01-04 00:00:00 10 101 1
10 2011-01-04 08:00:00 11 102 0
11 2011-01-04 16:00:00 12 15 0