我有以下数据框:
Time Work
2018-12-01 10:00:00 Off
2018-12-01 10:00:02 On
2018-12-01 10:00:05 On
2018-12-01 10:00:06 On
2018-12-01 10:00:07 On
2018-12-01 10:00:09 Off
2018-12-01 10:00:11 Off
2018-12-01 10:00:14 On
2018-12-01 10:00:16 On
2018-12-01 10:00:18 On
2018-12-01 10:00:20 Off
我想用自设备开始工作以来的时间创建一个新列。
Time Work Elapsed Time
2018-12-01 10:00:00 Off 0
2018-12-01 10:00:02 On 2
2018-12-01 10:00:05 On 5
2018-12-01 10:00:06 On 6
2018-12-01 10:00:07 On 7
2018-12-01 10:00:09 Off 0
2018-12-01 10:00:11 Off 0
2018-12-01 10:00:14 On 3
2018-12-01 10:00:16 On 5
2018-12-01 10:00:18 On 7
2018-12-01 10:00:20 Off 0
我该怎么办?
答案 0 :(得分:14)
您可以使用groupby
:
# df['Time'] = pd.to_datetime(df['Time'], errors='coerce') # Uncomment if needed.
sec = df['Time'].dt.second
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0
这个想法是要提取秒数部分,并从状态从“关”变为“开”的第一时刻减去经过的时间。这是使用transform
和first
完成的。
cumsum
用于标识组:
df.Work.eq('Off').cumsum()
0 1
1 1
2 1
3 1
4 1
5 2
6 3
7 3
8 3
9 3
10 4
Name: Work, dtype: int64
在“开启”状态下,如果您的设备可能跨越数分钟,则将sec
初始化为:
sec = df['Time'].values.astype(np.int64) // 10e8
df['Elapsed Time'] = (
sec - sec.groupby(df.Work.eq('Off').cumsum()).transform('first'))
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
答案 1 :(得分:8)
IIUC first
和transform
(df.Time-df.Time.groupby(df.Work.eq('Off').cumsum()).transform('first')).dt.seconds
Out[1090]:
0 0
1 2
2 5
3 6
4 7
5 0
6 0
7 3
8 5
9 7
10 0
Name: Time, dtype: int64
答案 2 :(得分:7)
您可以使用两个groupbys
。第一个计算每个组内的时间差。然后第二个将每个组中的那些相加。
s = (df.Work=='Off').cumsum()
df['Elapsed Time'] = df.groupby(s).Time.diff().dt.total_seconds().fillna(0).groupby(s).cumsum()
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
答案 3 :(得分:4)
使用groupby,您可以执行以下操作:
df['Elapsed Time'] = (df.groupby(df.Work.eq('Off').cumsum()).Time
.transform(lambda x: x.diff()
.dt.total_seconds()
.cumsum())
.fillna(0))
>>> df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0.0
1 2018-12-01 10:00:02 On 2.0
2 2018-12-01 10:00:05 On 5.0
3 2018-12-01 10:00:06 On 6.0
4 2018-12-01 10:00:07 On 7.0
5 2018-12-01 10:00:09 Off 0.0
6 2018-12-01 10:00:11 Off 0.0
7 2018-12-01 10:00:14 On 3.0
8 2018-12-01 10:00:16 On 5.0
9 2018-12-01 10:00:18 On 7.0
10 2018-12-01 10:00:20 Off 0.0
答案 4 :(得分:4)
麻木的切片方法
u, f, i = np.unique(df.Work.eq('Off').values.cumsum(), True, True)
t = df.Time.values
df['Elapsed Time'] = t - t[f[i]]
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 00:00:00
1 2018-12-01 10:00:02 On 00:00:02
2 2018-12-01 10:00:05 On 00:00:05
3 2018-12-01 10:00:06 On 00:00:06
4 2018-12-01 10:00:07 On 00:00:07
5 2018-12-01 10:00:09 Off 00:00:00
6 2018-12-01 10:00:11 Off 00:00:00
7 2018-12-01 10:00:14 On 00:00:03
8 2018-12-01 10:00:16 On 00:00:05
9 2018-12-01 10:00:18 On 00:00:07
10 2018-12-01 10:00:20 Off 00:00:00
我们可以用以下方式确定整数位
df['Elapsed Time'] = (t - t[f[i]]).astype('timedelta64[s]').astype(int)
df
Time Work Elapsed Time
0 2018-12-01 10:00:00 Off 0
1 2018-12-01 10:00:02 On 2
2 2018-12-01 10:00:05 On 5
3 2018-12-01 10:00:06 On 6
4 2018-12-01 10:00:07 On 7
5 2018-12-01 10:00:09 Off 0
6 2018-12-01 10:00:11 Off 0
7 2018-12-01 10:00:14 On 3
8 2018-12-01 10:00:16 On 5
9 2018-12-01 10:00:18 On 7
10 2018-12-01 10:00:20 Off 0