从其他两个数据框中的元素创建数据框

时间:2020-04-22 18:49:46

标签: python pandas

以下代码可以正常工作。它通过比较其他两个数据框中的元素并对其进行过滤来创建数据框。

import pandas as pd

df1 = pd.DataFrame({'i': [40, 40, 40, 41, 41],
                    'j': [140, 140, 140, 141, 142],
                    'k' : [140, 141, 142, 141, 142],
                    'avg': [33, 31, 30, 33, 29]})

df2 = pd.DataFrame({'i': [10, 40, 10, 41, 21, 11],
                    'j': [110, 140, 110, 141, 122, 111],
                    'k' : [110, 141, 142, 141, 122, 111],
                    'avg': [31, 30, 40, 31, 25, 29]})

offset = abs(df1.loc[0,'i'] - df2.loc[0,'i'])

df3 = pd.DataFrame(columns=["i_ref", "j_ref", "k_ref", "i_sim", "j_sim", "k_sim", "avg_ref", "avg_sim", "deviation [%]"])
for row1 in df1.itertuples():
    for row2 in df2.itertuples():
        if (abs(row1.i - row2.i) == offset) & (abs(row1.j - row2.j) == offset) & (abs(row1.k - row2.k) == offset):
            df3 = df3.append({
            "i_ref": row1.i,
            "j_ref": row1.j,
            "k_ref": row1.k,
            "i_sim": row2.i,
            "j_sim": row2.j,
            "k_sim": row2.k,
            "avg_ref": row1.avg,
            "avg_sim" : row2.avg,
            "deviation [%]" : abs(row1.avg-row2.avg)*100/row1.avg
            }, ignore_index=True)

但是,在我实际的情况下,df1和df2分别具有100万个条目(从csv文件读取),计算将永远耗费时间。

我的问题:有没有更有效的方法来获得相同的结果?

0 个答案:

没有答案