我有两个数据帧,我想根据df_2的时间戳索引创建一个过滤的df_1,如下所示。对于df_2的每个索引值,我想获取df_1的所有行,该行在df_2索引值的1天的timedelta内。
示例:对于df_2索引值10/15/2017
,我想在新的df_outcome中包含介于10/14/2017
到10/16/2017
之间的所有df_1行,这些行返回{{1} }和10/14/2017 f
。将删除查询中的任何重复项。
10/15/2017 g
感谢任何帮助,谢谢。
编辑:
我编辑后将索引更改为时间戳,以反映实际问题。我很抱歉任何混乱,我没想到会有问题。时间戳不均匀。
答案 0 :(得分:0)
新答案基于原始答案,同时使用集合来识别有效索引:
# convert strings to timestamps if not done already
df_1.index = pd.to_datetime(df_1.index)
df_2.index = pd.to_datetime(df_2.index)
# helper function to extract days since epoch
def extract_days_since_epoch(timeseries):
epoch_start = pd.datetime(1970, 1, 1)
return (timeseries - epoch_start).days
# get indices as days
index_1_days = extract_days_since_epoch(df_1.index)
index_2_days = extract_days_since_epoch(df_2.index)
threshold = 1
ranges = [range(x-threshold, x+threshold+1) for x in index_2_days]
allowed_indices = {value for sub_range in ranges
for value in sub_range}
# get intersection of allowed and present indicies
valid_indices = allowed_indices.intersection(index_1_days)
# use assign, query and drop to filter matches
df_1.assign(days=index_1_days)\
.query("days in @valid_indices")\
.drop(["days"], axis=1)
Values
Index
2017-10-04 b
2017-10-05 c
2017-10-07 d
2017-10-14 f
2017-10-15 g
您可以为此目的使用pandas Index
的设置操作。首先,使用list和set comprehensions创建一组允许的索引。其次,获得允许和现有指数的intersection。最后,使用有效索引到reindex目标数据框。:
# define threshold range to include values from df2
threshold = 10
# create set of allowed indices via set comprehension
ranges = [range(x-threshold, x+threshold+1) for x in df_2.index]
allowed_indices = {value for sub_range in ranges for value in sub_range}
# get intersection of allowed and present indicies
valid_indices = df_1.index.intersection(allowed_indices)
# use reindex with valid indices
df_result = df_1.reindex(valid_indices)
print(df_result)
Values
10 a
20 b
30 c
40 d
70 g
80 h
答案 1 :(得分:0)
以下索引器应该在日期时间执行操作:
threshold = pd.Timedelta('1 hour')
indexer = pd.Series(df1.index, index=df1.index).apply(
lambda x: min(abs(x - df2.index)) < threshold
)
df1.loc[indexer]
注意:它不能很好地扩展。如果len(df1) * len(df2)
~10 6