想对行之间的列内的更改值进行一些跟踪。 我有一个由车辆,时间戳记,模式(0、2、4、8中的4个模式)组成的数据集 例如
vehicle, timestamp, mode
x,1970-01-19 01:24:59.973, 0
x,1970-01-19 01:25:59.973, 2
x,1970-01-19 01:26:59.973, 2
x,1970-01-19 01:27:59.973, 0
x,1970-01-19 01:28:59.973, 2
x,1970-01-19 01:29:59.973, 0
x,1970-01-19 01:30:59.973, 0
x,1970-01-19 01:31:59.973, 2
x,1970-01-19 01:32:59.973, 0
我想跟踪模式的变化,特别是当模式从2变为0时。如下图所示
vehicle, timestamp, mode, changes
x,1970-01-19 01:24:59.973, 0, NaN
x,1970-01-19 01:25:59.973, 2, NaN
x,1970-01-19 01:26:59.973, 2, NaN
x,1970-01-19 01:27:59.973, 0, 1
x,1970-01-19 01:28:59.973, 2, NaN
x,1970-01-19 01:29:59.973, 0, 1
x,1970-01-19 01:30:59.973, 0, NaN
x,1970-01-19 01:31:59.973, 2, NaN
x,1970-01-19 01:32:59.973, 0, 1
请咨询!
答案 0 :(得分:1)
不需要for循环或列表理解。使用diff
输入:
from io import StringIO
import pandas as pd
import numpy as np
df = pd.read_table(StringIO("""vehicle, timestamp, mode
x,1970-01-19 01:24:59.973, 0
x,1970-01-19 01:25:59.973, 2
x,1970-01-19 01:26:59.973, 2
x,1970-01-19 01:27:59.973, 0
x,1970-01-19 01:28:59.973, 2
x,1970-01-19 01:29:59.973, 0
x,1970-01-19 01:30:59.973, 0
x,1970-01-19 01:31:59.973, 2
x,1970-01-19 01:32:59.973, 0""".replace(', ', ',')), sep=',', engine='python')
添加新列“ changes”,并在diff为-2时用1填充:
df.loc[(df['mode'].diff() == -2) & (df['mode'] == 0), 'changes'] = 1
输出:
vehicle timestamp mode changes
0 x 1970-01-19 01:24:59.973 0 NaN
1 x 1970-01-19 01:25:59.973 2 NaN
2 x 1970-01-19 01:26:59.973 2 NaN
3 x 1970-01-19 01:27:59.973 0 1.0
4 x 1970-01-19 01:28:59.973 2 NaN
5 x 1970-01-19 01:29:59.973 0 1.0
6 x 1970-01-19 01:30:59.973 0 NaN
7 x 1970-01-19 01:31:59.973 2 NaN
8 x 1970-01-19 01:32:59.973 0 1.0
答案 1 :(得分:0)
这应该有效:
import pandas as pd
import numpy as np
columns = ["vehicle", 'timestamp', 'mode']
rows = [["x","1970-01-19 01:24:59.973", 0],
["x","1970-01-19 01:25:59.973", 2],
["x","1970-01-19 01:26:59.973", 2],
["x","1970-01-19 01:27:59.973", 0],
["x","1970-01-19 01:28:59.973", 2],
["x","1970-01-19 01:29:59.973", 0],
["x","1970-01-19 01:30:59.973", 0],
["x","1970-01-19 01:31:59.973", 2],
["x","1970-01-19 01:32:59.973", 0]]
df = pd.DataFrame(rows, columns=columns)
df['changes'] = [np.nan] + [1 if prev == 2 and cur == 0 else np.nan for prev, cur in zip(df['mode'], df['mode'][1:])]
print(df)
这将输出:
vehicle timestamp mode changes
0 x 1970-01-19 01:24:59.973 0 NaN
1 x 1970-01-19 01:25:59.973 2 NaN
2 x 1970-01-19 01:26:59.973 2 NaN
3 x 1970-01-19 01:27:59.973 0 1.0
4 x 1970-01-19 01:28:59.973 2 NaN
5 x 1970-01-19 01:29:59.973 0 1.0
6 x 1970-01-19 01:30:59.973 0 NaN
7 x 1970-01-19 01:31:59.973 2 NaN
8 x 1970-01-19 01:32:59.973 0 1.0
根据需要