我有一个具有以下datetimeindex的大数据框:
... Date A B
190 2019-09-13 21:50:00 1 2
191 2019-09-13 21:55:00 3 2
192 2019-09-13 22:00:00 1 2
193 2019-09-13 22:05:00 3 2
194 2019-09-13 22:10:00 1 2
195 2019-09-16 06:00:00 1 2
196 2019-09-16 06:05:00 1 2
197 2019-09-16 06:10:00 4 2
198 2019-09-16 06:15:00 1 2
199 2019-09-16 06:20:00 4 2
200 2019-09-16 06:25:00 1 2
.....
Name: Date, dtype: datetime64[ns]
现在,我需要计算A是否大于或等于B,但每天只有一次。 我如何才能做到每天只用第一笔命中填充列表?
count = []
for i in df.index:
if A[i] >= B[i]:
count.append('A is larger than B' + f" on {df.Date[i]}")
根据此示例,我希望的输出为
A is larger than B on 2019-09-13 21:55:00
A is larger than B on 2019-09-16 06:10:00
答案 0 :(得分:1)
您可以先用Series.ge
用boolean indexing
(大于或等于>=
)来过滤行,然后再用Series.dt.date
和GroupBy.first
得到第一个值: / p>
df['Date'] = pd.to_datetime(df['Date'])
m = df['A'].ge(df['B'])
df1 = df[m].groupby(df['Date'].dt.date).first()
print (df1)
Date A B
Date
2019-09-13 2019-09-13 21:55:00 3 2
2019-09-16 2019-09-16 06:10:00 4 2
或按日期创建帮助者列,然后使用DataFrame.drop_duplicates
:
df['Date'] = pd.to_datetime(df['Date'])
df['d'] = df['Date'].dt.date
m = df['A'].ge(df['B'])
df1 = df[m].drop_duplicates('d')
print (df1)
Date A B d
191 2019-09-13 21:55:00 3 2 2019-09-13
197 2019-09-16 06:10:00 4 2 2019-09-16
for d in df1.Date:
print ('A is larger than B' + f" on {d}")
A is larger than B on 2019-09-13 21:55:00
A is larger than B on 2019-09-16 06:10:00