Python Pandas:比较一个列中类似值的两个csv(dataframe)的行,并返回相似的行(列)的内容

时间:2018-04-20 11:21:34

标签: python pandas csv numpy dataframe

我正在使用两个csv文件并作为数据框导入。让我们说df1和df2如图所示,df1和df2具有不同的长度。 df1有50000行,df2有20000行。

DF1

DF2

我想比较(遍历行)'时间' df2与df1,找到时间上的差异并返回对应于相似行的所有列的值。

例如,66(df1中的'时间')最接近67(df2中的'时间')所以我想将内容返回到df1(15& #39; vel'和25' yaw')到df2并保存为新的csv

2 个答案:

答案 0 :(得分:0)

创建笛卡尔积,然后进行过滤 -

df1 = pd.DataFrame({'time': [11, 22, 33,44, 55, 66,77,88,99], 'vel':[10, 11,12,13,14,15,16,17,18],
                   'yaw' : [20, 21, 22,23,24,25,26,27,28]})
df2 = pd.DataFrame({'time' : [67, 75, 87, 99]})

df1['key'] = 1
df2['key'] = 1
df1.rename(index=str, columns ={'time' : 'time_x'}, inplace=True)

df = df2.merge(df1, on='key', how ='left').reset_index()
df['diff'] = df.apply(lambda x: abs(x['time']  - x['time_x']), axis=1)
df.sort_values(by=['time', 'diff'], inplace=True)

df=df.groupby(['time']).first().reset_index()[['time', 'vel', 'yaw']]

答案 1 :(得分:0)

可以在 iterrows()功能的帮助下完成。

以下是代码:

首次创建表格

value=[(0,11,10,20),(1,22,11,21),(2,33,12,22),(3,44,13,23),(4,55,14,24), 
(5,66,15,25),(6,77,16,26),(7,88,17,27),(8,99,18,28)]
header=["index","time","vel","yaw"]
df1 = pd.DataFrame.from_records(value, columns=header)

df1

第二个表创建

value=[(0,67,"nan","nan"),(1,75,"nan" ,"nan" ),(2,87,"nan" ,"nan" ) 
(3,99,"nan" ,"nan" )]
header=["index","time","vel","yaw"]
df2 = pd.DataFrame.from_records(value, columns=header)

df2

将结果存储在df2中:

for index, row in df2.iterrows():
    min=10000000
    for indexer, rows in df1.iterrows():
        if abs(row['time']-rows['time'])<min:
            min = abs(row['time']-rows['time'])
            #storing the position 
            pos = indexer
    df2.loc[index,'vel'] = df1['vel'][pos]
    df2.loc[index,'yaw'] = df1['yaw'][pos]

result