Question

我的数据是： df1 - 用日志准备的df

In[1]: import pandas as pd
In[2]: df1 = pd.DataFrame([[1, 'confirmed', 01/01/2017 14:05:00], [1, 'picked', 01/01/2017 14:10:00]], columns = ['ID', 'log', 'time'])
In[3]: print(df1)

我正在迭代它以在日志中找到'拾取'并占用相关时间，然后我在每个日志上迭代，其中正好在行之前使用'拾取'。

df2 - 与df1

具有相同索引的新空df

我有一个如下所示的循环：

for row in df1.index:
    if df1['log'][row] == 'picked':
        df2['time1'][row] = df1['time'][row]
        if df1['ID'][row] == df1['ID'][row-1]:
            df2['time2'][row] = df1['time'][row-1]

它在一个新的df中填充'time1'和'time2'列，这样我就可以在它们之间占用时间范围。这是进入队列的时间。

循环在输出方面工作正常，但它持续很长时间（df1有70万行，其中超过一半在'log'列中'已选中'）

我将非常感谢有关优化循环时间和循环形状的任何建议。

Answer 1

如果df1和df2具有相同的索引，那么您可能不必使用for循环。首先，试试这个：

# this will replace the values in df2['time1'] with the values of df1['time'] where df1['log'] == 'picked'
df2.loc[df1['log'] == 'picked', 'time1'] = df1.loc[df1['log'] == 'picked', 'time']

此外，如果您能提供一些样本数据，如上面的评论所示，这将有所帮助。

优化循环

1 个答案: