提高性能-if和for循环

时间:2018-07-06 15:30:32

标签: python-3.x pandas

我的数据框大约有1200万行,使用for循环时我需要提高性能,但是我不知道该怎么做。 我正在使用运行正常的Python / Pandas,但是速度非常慢。

对于每个时间戳,我需要让列num_prnum_pu使用以下条件根据TempTable的总和来计算。

TempTable = pd.DataFrame({'account': np.arange(1, 2), 'pr': 0, 'pu': 0})
TempTable = TempTable.set_index('account')

df['num_pr'] = 0
df['num_pu'] = 0
for row in range(0, 10000):
    if (df.action[row] == 'SA' and df.status[row] == 'PR') or (df.action[row] == 'I' and df.status[row] == 'PR'):
        TempTable.loc[df.account[row], 'pr'] = 1
    elif (df.action[row] == 'SA' and df.status[row] == 'PU') or (df.action[row] == 'I' and df.status[row] == 'PU'):
        TempTable.loc[df.account[row], 'pu'] = 1
    elif (df.action[row] == 'SO' and df.status[row] == 'PR'):
        TempTable.loc[df.account[row], 'pr'] = 0
    elif (df.action[row] == 'SO' and df.status[row] == 'PU'):
        TempTable.loc[df.account[row], 'pu'] = 0
    df.loc[row, 'num_pr'] = TempTable.loc[:, 'pr'].sum()
    df.loc[row, 'num_pu'] = TempTable.loc[:, 'pu'].sum()


    account status  timestamp               status  num_pr      num_pu
0   1111111 SA      2018-06-28 02:00:01.024 PU      0           1
1   2222222 I       2018-06-28 02:00:02.032 PU      0           2
2   1111111 I       2018-06-28 02:00:03.382 PU      0           2
3   3333333 SO      2018-06-28 02:00:04.395 PR      0           2
4   1111111 I       2018-06-28 02:00:05.401 PU      0           2
5   1111111 I       2018-06-28 02:00:05.407 PU      0           2
6   2222222 I       2018-06-28 02:00:06.409 PU      0           2
7   3333333 SA      2018-06-28 02:00:06.413 PR      1           2
8   1111111 SO      2018-06-28 02:00:07.414 PU      1           1
9   3333333 SO      2018-06-28 02:00:07.467 PR      0           1
10  1111111 SA      2018-06-28 02:00:08.414 PR      1           1

0 个答案:

没有答案