我的熊猫DataFrame具有以下当前结构:
{
'Temperature': [1,2,3,4,5,6,7,8,9],
'machining': [1,1,1,2,2,2,3,3,3],
'timestamp': [1560770645,1560770646,1560770647,1560770648,1560770649,1560770650,1560770651,1560770652,1560770653]
}
我想添加一个列,其中包含每个加工过程的相对时间,以便每当“加工”列更改其值时刷新。 因此,所需的结构为:
{
'Temperature': [1,2,3,4,5,6,7,8,9],
'machining': [1,1,1,2,2,2,3,3,3],
'timestamp': [1560770645,1560770646,1560770647,1560770648,1560770649,1560770650,1560770651,1560770652,1560770653]
'timestamp_machining': [1,2,3,1,2,3,1,2,3]
}
我正在努力以一种简洁的方式来做到这一点:如果没有熊猫,任何帮助也将不胜感激。
答案 0 :(得分:1)
减去由GroupBy.transform
创建的每个组的第一个值:
#if values are not sorted
df = df.sort_values(['machining','timestamp'])
print (df.groupby('machining')['timestamp'].transform('first'))
0 1560770645
1 1560770645
2 1560770645
3 1560770648
4 1560770648
5 1560770648
6 1560770651
7 1560770651
8 1560770651
Name: timestamp, dtype: int64
df['new'] = df['timestamp'].sub(df.groupby('machining')['timestamp'].transform('first')) + 1
print (df)
Temperature machining timestamp timestamp_machining new
0 1 1 1560770645 1 1
1 2 1 1560770646 2 2
2 3 1 1560770647 3 3
3 4 2 1560770648 1 1
4 5 2 1560770649 2 2
5 6 2 1560770650 3 3
6 7 3 1560770651 1 1
7 8 3 1560770652 2 2
8 9 3 1560770653 3 3
如果仅需要计数器,那么GroupBy.cumcount
是您的朋友:
df['new'] = df.groupby('machining').cumcount() + 1