加速数据帧循环

时间:2018-05-29 14:50:57

标签: python performance pandas dataframe

我写了下面给出的代码。有两个Pandas数据框:df包含列timestamp_millipressuredf2包含列timestamp_milliacceleration_z。两个数据帧都有大约100行和3900行。在下面显示的代码中,我搜索df每行的df2行的每个时间戳,其中时间差在一个范围内并且是最小的。

不幸的是,代码非常慢。此外,我收到了来自df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]行的以下消息:

  

SettingWithCopyWarning:尝试在a的副本上设置值   从DataFrame切片。尝试使用.loc [row_indexer,col_indexer] =   代替值

如何加速代码并解决警告?

acceleration = []
pressure = []

for index, row in df.iterrows():
    mask = (df2["timestamp_milli"] >= (row["timestamp_milli"] - 5)) & (df2["timestamp_milli"] <= (row["timestamp_milli"] + 5))
    df_temp = df2[mask]

    # Select closest point
    if len(df_temp) > 0:
        df_temp["timestamp_milli"] = df_temp["timestamp_milli"] - row["timestamp_milli"]
        df_temp["timestamp_milli"] = df_temp["timestamp_milli"].abs()

        df_temp = df_temp.loc[df_temp["timestamp_milli"] == df_temp["timestamp_milli"].min()]

        for index2, row2 in df_temp.iterrows():
            pressure.append(row["pressure"])
            acc = row2["acceleration_z"]
            acceleration.append(acc)

1 个答案:

答案 0 :(得分:1)

我遇到了类似的问题,使用itertuples代替iterrows显示时间显着缩短。 why iterrows have issues. 希望这会有所帮助。