我一直在尝试找到最有效的方法。 假设我有一个DataFrame df1,看起来像:
time_start time_end
0 1548102229 1548102232
1 1548102239 1548102242
2 1548102249 1548102252
3 1548102259 1548102262
和另一个看起来像的DataFrame df2:
timestamp state
0 1548102231 A
1 1548102241 A
2 1548102248 B
3 1548102251 B
考虑到df2 ['timestamp']介于df1 ['time_start']和df1 ['time_end']之间的条件,是否存在将“状态”添加到df1的方法:
time_start time_end state
0 1548102229 1548102232 A
1 1548102239 1548102242 A
2 1548102249 1548102252 N/A
3 1548102259 1548102262 B
答案 0 :(得分:3)
使用IntervalIndex
和get_indexer
,然后我们在.loc
之后分配
idx=pd.IntervalIndex.from_arrays(df1['time_start'], df1['time_end'], closed='both')
indexmatch=idx.get_indexer(df2.timestamp)
df1['New']=df2.loc[indexmatch,'state'].values
df1
time_start time_end New
0 1548102229 1548102232 A
1 1548102239 1548102242 A
2 1548102249 1548102252 NaN
3 1548102259 1548102262 B
更新
idx=pd.IntervalIndex.from_arrays(df1['time_start'], df1['time_end'], closed='both')
indexmatch=idx.get_indexer(df2.timestamp)
dfcopy=df1.copy()
df1=df1.loc[indexmatch]
df1['New']=df2.loc[indexmatch,'state'].values
df1.groupby(['time_start','time_end'],as_index=False).New.sum().combine_first(dfcopy)
答案 1 :(得分:0)
使用np.less_equal
和np.greater_equal
outer
ufuncs
c = np.less_equal.outer(df2.timestamp, df.time_end) & \
np.greater_equal.outer(df2.timestamp, df.time_start)
df['state'] = df2.state.values[c.argmax(1)]
然后更正所有False
个结果
df.loc[~c.any(1), 'state'] = np.nan
time_start time_end state
0 1548102229 1548102232 A
1 1548102239 1548102242 A
2 1548102249 1548102252 NaN
3 1548102259 1548102262 B