我必须分析一些日志,并在此基础上进行一些计算,而我只能坚持一件事。 在这里,我试图以一种简单的形式重新创建我的问题。 假设我在“ stackoverflow.txt ”文件中有以下日志
23:58:03.458
23:58:13.446
23:58:23.447
23:58:33.440
23:58:43.440
23:58:53.440
23:59:03.434
23:59:13.435
23:59:23.428
23:59:33.428
23:59:43.429
23:59:53.435
00:00:03.429
00:00:13.423
00:00:23.417
00:00:33.411
00:00:43.418
00:00:53.411
00:01:03.405
00:01:13.406
00:01:23.400
00:01:33.406
00:01:43.400
00:01:53.411
00:02:03.400
00:02:13.406
00:02:23.394
00:02:33.400
00:02:43.394
我使用以下Python程序将这段时间转换为毫秒。
import pandas as pd
df = pd.read_csv("stackoverflow.txt", header=None)
# Split Time String into Hour Minutes Seconds and Milliseconds
new_df = df[0].str.split(":", n=-1, expand=True)
df['Hours'] = new_df[0]
df['Minutes'] = new_df[1]
# Split Seconds.Milliseconds information into Seconds and Milliseconds separately
new_df = new_df[2].str.split(".", n=-1, expand=True)
df['Seconds'] = new_df[0]
df['Milliseconds'] = new_df[1]
# These generated data frames are string, convert them into Integers
# df['Hours'] = df['Hours'].apply(lambda x: int(x,10))
# Another way of doing, good thing is that both are consuming same amount of time, checked using %time
df['Hours'] = pd.to_numeric(df['Hours'], errors='coerce')
df['Minutes'] = pd.to_numeric(df['Minutes'], errors='coerce')
df['Seconds'] = pd.to_numeric(df['Seconds'], errors='coerce')
df['Milliseconds'] = pd.to_numeric(df['Milliseconds'], errors='coerce')
# Calculate Total Time
df['Total Time(ms)'] = df['Hours']*3600000 + df['Minutes']*60000 + df['Seconds']*1000 + df['Milliseconds']
df
输出如下:
0 Hours Minutes Seconds Milliseconds Total Time(ms)
0 23:58:03.458 23 58 3 458 86283458
1 23:58:13.446 23 58 13 446 86293446
2 23:58:23.447 23 58 23 447 86303447
3 23:58:33.440 23 58 33 440 86313440
4 23:58:43.440 23 58 43 440 86323440
5 23:58:53.440 23 58 53 440 86333440
6 23:59:03.434 23 59 3 434 86343434
7 23:59:13.435 23 59 13 435 86353435
8 23:59:23.428 23 59 23 428 86363428
9 23:59:33.428 23 59 33 428 86373428
10 23:59:43.429 23 59 43 429 86383429
11 23:59:53.435 23 59 53 435 86393435
12 00:00:03.429 0 0 3 429 3429
13 00:00:13.423 0 0 13 423 13423
14 00:00:23.417 0 0 23 417 23417
15 00:00:33.411 0 0 33 411 33411
16 00:00:43.418 0 0 43 418 43418
17 00:00:53.411 0 0 53 411 53411
18 00:01:03.405 0 1 3 405 63405
19 00:01:13.406 0 1 13 406 73406
20 00:01:23.400 0 1 23 400 83400
21 00:01:33.406 0 1 33 406 93406
22 00:01:43.400 0 1 43 400 103400
23 00:01:53.411 0 1 53 411 113411
24 00:02:03.400 0 2 3 400 123400
25 00:02:13.406 0 2 13 406 133406
26 00:02:23.394 0 2 23 394 143394
27 00:02:33.400 0 2 33 400 153400
28 00:02:43.394 0 2 43 394 163394
但是,我希望每天的变化从24:59到00:00都增加24小时。 我无法理解,我将如何做到这一点。 有人可以帮我实现这一目标吗?
答案 0 :(得分:1)
我建议使用Timedelta
s:
df = pd.read_csv("stackoverflow.txt", header=None)
首先将列转换为to_timedelta
,然后求和,与Timedelta(0)
比较,然后为下一行添加pd.Timedelta(24, 'h')
。
td = pd.to_timedelta(df[0])
df['new'] = td.mask(td.diff().lt(pd.Timedelta(0)).cumsum().gt(0), td + pd.Timedelta(1, 'days'))
df['newint'] = (df['new'].dt.total_seconds() * 1000).astype(int)
print (df)
0 new newint
0 23:58:03.458 0 days 23:58:03.458000 86283458
1 23:58:13.446 0 days 23:58:13.446000 86293446
2 23:58:23.447 0 days 23:58:23.447000 86303447
3 23:58:33.440 0 days 23:58:33.440000 86313440
4 23:58:43.440 0 days 23:58:43.440000 86323440
5 23:58:53.440 0 days 23:58:53.440000 86333440
6 23:59:03.434 0 days 23:59:03.434000 86343434
7 23:59:13.435 0 days 23:59:13.435000 86353435
8 23:59:23.428 0 days 23:59:23.428000 86363428
9 23:59:33.428 0 days 23:59:33.428000 86373428
10 23:59:43.429 0 days 23:59:43.429000 86383429
11 23:59:53.435 0 days 23:59:53.435000 86393435
12 00:00:03.429 1 days 00:00:03.429000 86403429
13 00:00:13.423 1 days 00:00:13.423000 86413423
14 00:00:23.417 1 days 00:00:23.417000 86423417
15 00:00:33.411 1 days 00:00:33.411000 86433411
16 00:00:43.418 1 days 00:00:43.418000 86443418
17 00:00:53.411 1 days 00:00:53.411000 86453411
18 00:01:03.405 1 days 00:01:03.405000 86463405
19 00:01:13.406 1 days 00:01:13.406000 86473406
20 00:01:23.400 1 days 00:01:23.400000 86483400
21 00:01:33.406 1 days 00:01:33.406000 86493406
22 00:01:43.400 1 days 00:01:43.400000 86503400
23 00:01:53.411 1 days 00:01:53.411000 86513411
24 00:02:03.400 1 days 00:02:03.400000 86523400
25 00:02:13.406 1 days 00:02:13.406000 86533406
26 00:02:23.394 1 days 00:02:23.394000 86543394
27 00:02:33.400 1 days 00:02:33.400000 86553400
28 00:02:43.394 1 days 00:02:43.394000 86563394
解决方案中的数据是多天的-因此对于第一次更改,请添加1天,接下来的2天...
创建差异,添加累加和并将输出转换为日timedeltas,然后将其添加到原始数据中:
print (df)
0
0 23:59:23.428
1 23:59:33.428
2 23:59:43.429
3 23:59:53.435
4 00:00:03.429
5 00:00:13.423
6 00:00:23.417
7 00:00:33.411
8 23:59:23.428
9 23:59:33.428
10 23:59:43.429
11 23:59:53.435
12 00:00:03.429
13 00:00:13.423
14 00:00:23.417
15 00:00:33.411
td = pd.to_timedelta(df[0])
days = pd.to_timedelta(td.diff().lt(pd.Timedelta(0)).cumsum(), unit='d')
df['new'] = td + days
df['newint'] = (df['new'].dt.total_seconds() * 1000).astype(int)
print (df)
0 new newint
0 23:59:23.428 0 days 23:59:23.428000 86363428
1 23:59:33.428 0 days 23:59:33.428000 86373428
2 23:59:43.429 0 days 23:59:43.429000 86383429
3 23:59:53.435 0 days 23:59:53.435000 86393435
4 00:00:03.429 1 days 00:00:03.429000 86403429
5 00:00:13.423 1 days 00:00:13.423000 86413423
6 00:00:23.417 1 days 00:00:23.417000 86423417
7 00:00:33.411 1 days 00:00:33.411000 86433411
8 23:59:23.428 1 days 23:59:23.428000 172763428
9 23:59:33.428 1 days 23:59:33.428000 172773428
10 23:59:43.429 1 days 23:59:43.429000 172783429
11 23:59:53.435 1 days 23:59:53.435000 172793435
12 00:00:03.429 2 days 00:00:03.429000 172803429
13 00:00:13.423 2 days 00:00:13.423000 172813423
14 00:00:23.417 2 days 00:00:23.417000 172823417
15 00:00:33.411 2 days 00:00:33.411000 172833411
编辑:
日期说明:
先得到diff
的差额:
print (td.diff())
0 NaT
1 00:00:10
2 00:00:10.001000
3 00:00:10.006000
4 -1 days +00:00:09.994000
5 00:00:09.994000
6 00:00:09.994000
7 00:00:09.994000
8 23:58:50.017000
9 00:00:10
10 00:00:10.001000
11 00:00:10.006000
12 -1 days +00:00:09.994000
13 00:00:09.994000
14 00:00:09.994000
15 00:00:09.994000
Name: 0, dtype: timedelta64[ns]
然后用lt
(<
)比较负的时间差:
print (td.diff().lt(pd.Timedelta(0)))
0 False
1 False
2 False
3 False
4 True
5 False
6 False
7 False
8 False
9 False
10 False
11 False
12 True
13 False
14 False
15 False
Name: 0, dtype: bool
通过cumsum
获取累计金额:
print (td.diff().lt(pd.Timedelta(0)).cumsum())
0 0
1 0
2 0
3 0
4 1
5 1
6 1
7 1
8 1
9 1
10 1
11 1
12 2
13 2
14 2
15 2
Name: 0, dtype: int32
最后一次转换为天timedeltas:
days = pd.to_timedelta(td.diff().lt(pd.Timedelta(0)).cumsum(), unit='d')
print (days)
0 0 days
1 0 days
2 0 days
3 0 days
4 1 days
5 1 days
6 1 days
7 1 days
8 1 days
9 1 days
10 1 days
11 1 days
12 2 days
13 2 days
14 2 days
15 2 days
Name: 0, dtype: timedelta64[ns]
编辑:
您的解决方案中可能使用相同的想法:
...
df['Total Time(ms)'] = df['Hours']*3600000 + df['Minutes']*60000 +
df['Seconds']*1000 + df['Milliseconds']
s = df['Total Time(ms)'].diff().lt(0).cumsum() * 24 * 60 * 60 * 1000
df['newint'] = s + df['Total Time(ms)']
print (df)
0 Hours Minutes Seconds Milliseconds Total Time(ms) \
0 23:59:23.428 23 59 23 428 86363428
1 23:59:33.428 23 59 33 428 86373428
2 23:59:43.429 23 59 43 429 86383429
3 23:59:53.435 23 59 53 435 86393435
4 00:00:03.429 0 0 3 429 3429
5 00:00:13.423 0 0 13 423 13423
6 00:00:23.417 0 0 23 417 23417
7 00:00:33.411 0 0 33 411 33411
8 23:59:23.428 23 59 23 428 86363428
9 23:59:33.428 23 59 33 428 86373428
10 23:59:43.429 23 59 43 429 86383429
11 23:59:53.435 23 59 53 435 86393435
12 00:00:03.429 0 0 3 429 3429
13 00:00:13.423 0 0 13 423 13423
14 00:00:23.417 0 0 23 417 23417
15 00:00:33.411 0 0 33 411 33411
newint
0 86363428
1 86373428
2 86383429
3 86393435
4 86403429
5 86413423
6 86423417
7 86433411
8 172763428
9 172773428
10 172783429
11 172793435
12 172803429
13 172813423
14 172823417
15 172833411