跟踪python中各行之间的值变化(特定变化)

时间:2018-08-10 02:02:32

标签: python pandas dataframe spyder

想对行之间的列内的更改值进行一些跟踪。 我有一个由车辆,时间戳记,模式(0、2、4、8中的4个模式)组成的数据集 例如

vehicle, timestamp, mode
x,1970-01-19 01:24:59.973, 0
x,1970-01-19 01:25:59.973, 2
x,1970-01-19 01:26:59.973, 2
x,1970-01-19 01:27:59.973, 0
x,1970-01-19 01:28:59.973, 2
x,1970-01-19 01:29:59.973, 0
x,1970-01-19 01:30:59.973, 0
x,1970-01-19 01:31:59.973, 2
x,1970-01-19 01:32:59.973, 0

我想跟踪模式的变化,特别是当模式从2变为0时。如下图所示

vehicle, timestamp, mode, changes
x,1970-01-19 01:24:59.973, 0, NaN
x,1970-01-19 01:25:59.973, 2, NaN
x,1970-01-19 01:26:59.973, 2, NaN
x,1970-01-19 01:27:59.973, 0, 1
x,1970-01-19 01:28:59.973, 2, NaN
x,1970-01-19 01:29:59.973, 0, 1
x,1970-01-19 01:30:59.973, 0, NaN
x,1970-01-19 01:31:59.973, 2, NaN
x,1970-01-19 01:32:59.973, 0, 1

请咨询!

2 个答案:

答案 0 :(得分:1)

不需要for循环或列表理解。使用diff

输入:

from io import StringIO
import pandas as pd
import numpy as np
df = pd.read_table(StringIO("""vehicle, timestamp, mode
x,1970-01-19 01:24:59.973, 0
x,1970-01-19 01:25:59.973, 2
x,1970-01-19 01:26:59.973, 2
x,1970-01-19 01:27:59.973, 0
x,1970-01-19 01:28:59.973, 2
x,1970-01-19 01:29:59.973, 0
x,1970-01-19 01:30:59.973, 0
x,1970-01-19 01:31:59.973, 2
x,1970-01-19 01:32:59.973, 0""".replace(', ', ',')), sep=',', engine='python')

添加新列“ changes”,并在diff为-2时用1填充:

df.loc[(df['mode'].diff() == -2) & (df['mode'] == 0), 'changes'] = 1

输出:

  vehicle                timestamp  mode  changes
0       x  1970-01-19 01:24:59.973     0      NaN
1       x  1970-01-19 01:25:59.973     2      NaN
2       x  1970-01-19 01:26:59.973     2      NaN
3       x  1970-01-19 01:27:59.973     0      1.0
4       x  1970-01-19 01:28:59.973     2      NaN
5       x  1970-01-19 01:29:59.973     0      1.0
6       x  1970-01-19 01:30:59.973     0      NaN
7       x  1970-01-19 01:31:59.973     2      NaN
8       x  1970-01-19 01:32:59.973     0      1.0

答案 1 :(得分:0)

这应该有效:

import pandas as pd
import numpy as np

columns = ["vehicle", 'timestamp', 'mode']

rows = [["x","1970-01-19 01:24:59.973", 0],
        ["x","1970-01-19 01:25:59.973", 2],
        ["x","1970-01-19 01:26:59.973", 2],
        ["x","1970-01-19 01:27:59.973", 0],
        ["x","1970-01-19 01:28:59.973", 2],
        ["x","1970-01-19 01:29:59.973", 0],
        ["x","1970-01-19 01:30:59.973", 0],
        ["x","1970-01-19 01:31:59.973", 2],
        ["x","1970-01-19 01:32:59.973", 0]]

df = pd.DataFrame(rows, columns=columns)
df['changes'] = [np.nan] + [1 if prev == 2 and cur == 0 else np.nan for prev, cur in zip(df['mode'], df['mode'][1:])]
print(df)

这将输出:

  vehicle                timestamp  mode  changes
0       x  1970-01-19 01:24:59.973     0      NaN
1       x  1970-01-19 01:25:59.973     2      NaN
2       x  1970-01-19 01:26:59.973     2      NaN
3       x  1970-01-19 01:27:59.973     0      1.0
4       x  1970-01-19 01:28:59.973     2      NaN
5       x  1970-01-19 01:29:59.973     0      1.0
6       x  1970-01-19 01:30:59.973     0      NaN
7       x  1970-01-19 01:31:59.973     2      NaN
8       x  1970-01-19 01:32:59.973     0      1.0

根据需要