以下代码可以正常工作。它通过比较其他两个数据框中的元素并对其进行过滤来创建数据框。
import pandas as pd
df1 = pd.DataFrame({'i': [40, 40, 40, 41, 41],
'j': [140, 140, 140, 141, 142],
'k' : [140, 141, 142, 141, 142],
'avg': [33, 31, 30, 33, 29]})
df2 = pd.DataFrame({'i': [10, 40, 10, 41, 21, 11],
'j': [110, 140, 110, 141, 122, 111],
'k' : [110, 141, 142, 141, 122, 111],
'avg': [31, 30, 40, 31, 25, 29]})
offset = abs(df1.loc[0,'i'] - df2.loc[0,'i'])
df3 = pd.DataFrame(columns=["i_ref", "j_ref", "k_ref", "i_sim", "j_sim", "k_sim", "avg_ref", "avg_sim", "deviation [%]"])
for row1 in df1.itertuples():
for row2 in df2.itertuples():
if (abs(row1.i - row2.i) == offset) & (abs(row1.j - row2.j) == offset) & (abs(row1.k - row2.k) == offset):
df3 = df3.append({
"i_ref": row1.i,
"j_ref": row1.j,
"k_ref": row1.k,
"i_sim": row2.i,
"j_sim": row2.j,
"k_sim": row2.k,
"avg_ref": row1.avg,
"avg_sim" : row2.avg,
"deviation [%]" : abs(row1.avg-row2.avg)*100/row1.avg
}, ignore_index=True)
但是,在我实际的情况下,df1和df2分别具有100万个条目(从csv文件读取),计算将永远耗费时间。
我的问题:有没有更有效的方法来获得相同的结果?