我想循环比较一个datetime64 [ns]熊猫系列和另一个系列的标量,也就是datetime64 [ns]。
数据框:
ds.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 174764 entries, 0 to 185622
Data columns (total 2 columns):
t1 174764 non-null datetime64[ns]
t2 174764 non-null datetime64[ns]
循环:
import pandas as pd
import numpy as np
ds['t3'] = np.zeros(ds.shape[0])
for i in range(ds.shape[0]):
r_i= ds['t1'].iat[i]
ds['t3'].iat[i] = ds[(ds.t1.gt(r_1)) & (ds.t2.le(r_i))]['t1'].count()
此刻持续约8分钟。我想至少有一半的时间。
答案 0 :(得分:0)
将值转换为numpy数组,并仅按True
计算np.sum
个值:
t1 = ds['t1'].values
t2 = ds['t2'].values
ds['t3'] = [np.sum((t1 > a) & (t2 < a)) for a, b in zip(t1, t2)]
另一个想法-需要更多内存:
t11 = t1[:, None]
ds['t3'] = np.sum((t1 > t11) & (t2 < t11), axis=1)