我有两个数据帧。数据框A
是:
[distance] [measure]
17442.77000 32.792658
17442.95100 32.792658
17517.49200 37.648482
17518.29600 37.648482
17565.77600 38.287118
17565.88800 38.287118
17596.93700 41.203340
17597.29700 41.203340
17602.16400 41.477979
17602.83900 41.612774
17618.16400 42.479890
17618.71100 42.681591
和数据框B
:
[mileage] [Driver]
17442.8 name1
17517.5 name2
17565.8 name3
17597.2 name4
17602.5 name5
17618.4 name6
对于数据框[mileage]
中的每个B
行,我想在数据框[distance]
中找到A
中的A.loc[(A['distance']>= milage_value) & A['distance']<= mileage_value]
行17442.77000 32.792658
17442.8 name1
17442.95100 32.792658
17517.49200 37.648482
17517.5 name2
17518.29600 37.648482
. .
. .
所以我可以这样:
def f(x):
return df.iloc[0,1]+(df.iloc[2,1]-df.iloc[0,1])*((df.iloc[1,0]-df.iloc[0,0])/(df.iloc[2,0]-df.iloc[0,0]))
a = df.rolling(window=3, min_periods=1).apply(f)[::3].reset_index(drop=True)
所以我可以在滚动窗口大小为3的情况下应用以下函数:
B
到目前为止,我一直在连接两个Dfs和排序值以生成上面的三元组,但是当来自df A[distance]
的两个值在val Streamingdf= dataFromKafkaDF.map(some transformation).writeStream(to Kafka again)
def refreshBroadcast={
BroadcastVariable.unPersist(blocking=true)
newVal="new data"
sparkSession.sparkContext.broadcast(BroadcastVariable)
}
的距离范围内时,会出现问题。任何提示/建议都非常感谢!
答案 0 :(得分:1)
我认为您可以使用merge_asof
使用direction
参数和drop_duplicates
来使用以下内容:
df_before = pd.merge_asof(df_a, df_b,
left_on='distance',
right_on='mileage',
direction='backward')\
.drop_duplicates(['mileage','Driver'], keep='first')[['distance','measure']]
df_after = pd.merge_asof(df_a, df_b,
left_on='distance',
right_on='mileage', direction='forward')\
.drop_duplicates(['mileage', 'Driver'], keep='last')[['distance','measure']]
df_middle = df_b.rename(columns={'Driver':'measure','mileage':'distance'})
pd.concat([df_before, df_middle, df_after]).sort_values('distance').drop_duplicates()
输出:
distance measure
0 17442.770 32.7927
0 17442.800 name1
1 17442.951 32.7927
2 17517.492 37.6485
1 17517.500 name2
3 17518.296 37.6485
4 17565.776 38.2871
2 17565.800 name3
5 17565.888 38.2871
6 17596.937 41.2033
3 17597.200 name4
7 17597.297 41.2033
8 17602.164 41.478
4 17602.500 name5
9 17602.839 41.6128
10 17618.164 42.4799
5 17618.400 name6
11 17618.711 42.6816