我有一个CSV文件,如下所示:
Timestamp Surface_Data
8737.37 Maze_A
8737.42 Maze_A
8740.40 Phone_Surface
8743.23 Desktop_Surface
8765.26 Phone_Surface
8765.29 Maze_A
8765.30 Phone_Surface
8765.56 Maze_B
8766.16 Maze_B
8783.74 Maze_A
8793.20 Maze_A
8840.12 Phone_Surface
8840.40 Phone_Surface
8841.40 Maze_B
我想添加一列来统计Maze_A到Maze_B或Maze_B到Maze_A的变化,它的外观必须类似于:
Timestamp Surface_Data Maze_Count
8737.37 Maze_A 1
8737.42 Maze_A
8740.40 Phone_Surface
8743.23 Desktop_Surface
8765.26 Phone_Surface
8765.29 Maze_A
8765.30 Phone_Surface
8765.56 Maze_B 2
8766.16 Maze_B
8783.74 Maze_A 3
8793.20 Maze_A
8840.12 Phone_Surface
8840.40 Phone_Surface
8841.40 Maze_B 4
当“ Surface_Data”列中的值发生更改时,我尝试使用cumsum(),但是它考虑了所有更改,包括不需要的其他值。因此,我希望只有在遇到Maze_A或Maze_B值时才会增加。
答案 0 :(得分:2)
shift
,where
,cumsum
s = df.Surface_Data
c = s.where(s.str.match('^Maze_[AB]$')).ffill()
d = c.ne(c.shift())
df.assign(Maze_Count=d.cumsum().where(d, ''))
Timestamp Surface_Data Maze_Count
0 8737.37 Maze_A 1
1 8737.42 Maze_A
2 8740.40 Phone_Surface
3 8743.23 Desktop_Surface
4 8765.26 Phone_Surface
5 8765.29 Maze_A
6 8765.30 Phone_Surface
7 8765.56 Maze_B 2
8 8766.16 Maze_B
9 8783.74 Maze_A 3
10 8793.20 Maze_A
11 8840.12 Phone_Surface
12 8840.40 Phone_Surface
13 8841.40 Maze_B 4
答案 1 :(得分:1)
一次尝试:
c = df['Surface_Data'].str.contains('Maze')
df['Maze_Count'] = df.loc[c, 'Surface_Data'].ne(df.loc[c, 'Surface_Data'].shift()
).astype(int).replace(0, np.nan).cumsum()
Timestamp Surface_Data Maze_Count
0 8737.37 Maze_A 1.0
1 8737.42 Maze_A NaN
2 8740.40 Phone_Surface NaN
3 8743.23 Desktop_Surface NaN
4 8765.26 Phone_Surface NaN
5 8765.29 Maze_A NaN
6 8765.30 Phone_Surface NaN
7 8765.56 Maze_B 2.0
8 8766.16 Maze_B NaN
9 8783.74 Maze_A 3.0
10 8793.20 Maze_A NaN
11 8840.12 Phone_Surface NaN
12 8840.40 Phone_Surface NaN
13 8841.40 Maze_B 4.0
答案 2 :(得分:1)
您还可以尝试过滤“ Maze_A”和“ Maze_B”的数据框,使用shift
然后使用cumsum
和drop_duplicates
查找更改,最后,assign
返回使用内部索引对齐的数据框:
x = df.loc[df['Surface_Data'].isin(['Maze_A','Maze_B']), 'Surface_Data']
df.assign(Maze_count=(x != x.shift()).cumsum().drop_duplicates())
输出:
Timestamp Surface_Data Maze_count
0 8737.37 Maze_A 1.0
1 8737.42 Maze_A NaN
2 8740.40 Phone_Surface NaN
3 8743.23 Desktop_Surface NaN
4 8765.26 Phone_Surface NaN
5 8765.29 Maze_A NaN
6 8765.30 Phone_Surface NaN
7 8765.56 Maze_B 2.0
8 8766.16 Maze_B NaN
9 8783.74 Maze_A 3.0
10 8793.20 Maze_A NaN
11 8840.12 Phone_Surface NaN
12 8840.40 Phone_Surface NaN
13 8841.40 Maze_B 4.0