History_id Device_id Status Start_date End_date
0 40162 AUH888 1 2018-10-22 08:33:22 2018-10-22 08:34:26
1 40163 AUH888 0 2018-10-22 08:34:26 2018-10-22 10:15:00
2 40167 AUH888 3 2018-10-22 10:15:00 2018-10-23 12:40:01
3 40224 AUH888 0 2018-10-23 12:40:01 2018-10-23 13:00:00
4 40227 AUH888 3 2018-10-23 13:00:00 2018-10-25 07:43:30
5 40296 AUH888 0 2018-10-25 07:43:30 2018-10-25 08:00:00
6 40298 AUH888 3 2018-10-25 08:00:00 2018-10-25 08:28:38
7 40301 AUH888 0 2018-10-25 08:28:38 2018-11-05 12:15:00
8 40965 AUH888 3 2018-11-05 12:15:00 2018-11-07 08:06:58
9 41085 AUH888 0 2018-11-07 08:06:58 2018-11-12 07:15:00
10 41256 AUH888 3 2018-11-12 07:15:00 2018-11-12 07:19:29
11 41257 AUH888 0 2018-11-12 07:19:29 2018-11-15 10:45:00
12 41412 AUH888 3 2018-11-15 10:45:00 2018-11-17 09:38:42
13 41469 AUH888 0 2018-11-17 09:38:42 2018-11-19 10:15:00
14 41555 AUH888 3 2018-11-19 10:15:00 2018-11-20 05:21:19
15 41581 AUH888 0 2018-11-20 05:21:19 2018-11-20 05:45:00
16 41582 AUH888 3 2018-11-20 05:45:00 2018-11-20 10:32:37
17 41594 AUH888 0 2018-11-20 10:32:37 2018-11-27 00:45:00
18 41856 AUH888 3 2018-11-27 00:45:00 2018-11-27 02:57:24
19 41858 AUH888 0 2018-11-27 02:57:24 2018-11-27 08:15:00
20 41877 AUH888 3 2018-11-27 08:15:00 2018-11-27 08:16:16
21 41878 AUH888 0 2018-11-27 08:16:16 2018-11-27 15:00:00
22 41900 AUH888 3 2018-11-27 15:00:00 2018-11-27 17:55:37
23 41902 AUH888 0 2018-11-27 17:55:37 2018-12-07 12:15:00
24 42301 AUH888 3 2018-12-07 12:15:00 2018-12-07 12:21:48
25 42302 AUH888 0 2018-12-07 12:21:48 2018-12-12 07:30:00
26 42518 AUH888 3 2018-12-12 07:30:00 2018-12-12 11:42:39
27 42542 AUH888 0 2018-12-12 11:42:39 2018-12-27 10:00:00
28 43319 AUH888 3 2018-12-27 10:00:00 2018-12-27 10:06:39
29 43320 AUH888 0 2018-12-27 10:06:39 2018-12-30 07:30:00
30 43437 AUH888 3 2018-12-30 07:30:00 2018-12-30 07:42:18
31 43438 AUH888 0 2018-12-30 07:42:18 2018-12-30 10:00:00
32 43445 AUH888 3 2018-12-30 10:00:00 2018-12-30 14:09:08
33 43447 AUH888 0 2018-12-30 14:09:08 2019-01-03 12:15:00
34 43566 AUH888 3 2019-01-03 12:15:00 2019-01-03 14:57:34
35 43572 AUH888 0 2019-01-03 14:57:34 2019-01-06 06:45:00
36 43656 AUH888 3 2019-01-06 06:45:00 2019-01-06 12:09:59
37 43677 AUH888 0 2019-01-06 12:09:59 2019-01-09 08:45:00
38 43835 AUH888 3 2019-01-09 08:45:00 2019-01-09 09:11:15
39 43837 AUH888 0 2019-01-09 09:11:15 2019-02-09 15:00:00
40 44866 AUH888 3 2019-02-09 15:00:00 2019-02-09 15:25:45
41 44867 AUH888 0 2019-02-09 15:25:45 2019-02-11 08:00:00
42 44956 AUH888 3 2019-02-11 08:00:00 2019-02-12 16:20:42
43 45139 AUH888 0 2019-02-12 16:20:42 2019-02-12 16:45:06
44 45142 AUH888 3 2019-02-12 16:45:06 2019-02-12 17:08:52
45 45146 AUH888 0 2019-02-12 17:08:52 2019-02-12 17:30:00
46 45154 AUH888 3 2019-02-12 17:30:00 2019-02-12 18:32:14
47 45177 AUH888 0 2019-02-12 18:32:14 2019-02-12 18:45:00
48 45179 AUH888 3 2019-02-12 18:45:00 2019-02-12 19:36:39
49 45186 AUH888 0 2019-02-12 19:36:39 2019-02-12 20:00:00
50 40905 SHJ656 3 2018-11-04 14:00:00 2018-11-04 14:38:06
51 40906 SHJ656 0 2018-11-04 14:38:06 2018-11-04 15:00:00
52 40908 SHJ656 3 2018-11-04 15:00:00 2018-11-04 15:14:46
53 40909 SHJ656 0 2018-11-04 15:14:46 2018-11-04 16:15:00
54 40911 SHJ656 3 2018-11-04 16:15:00 2018-11-04 17:14:25
55 40913 SHJ656 0 2018-11-04 17:14:25 2018-11-04 17:45:00
56 40914 SHJ656 3 2018-11-04 17:45:00 2018-11-04 18:08:18
57 40915 SHJ656 0 2018-11-04 18:08:18 2018-11-04 18:30:00
58 40916 SHJ656 3 2018-11-04 18:30:00 2018-11-04 19:30:23
59 40920 SHJ656 0 2018-11-04 19:30:23 2018-11-04 19:45:00
60 40921 SHJ656 3 2018-11-04 19:45:00 2018-11-04 19:48:24
61 40922 SHJ656 0 2018-11-04 19:48:24 2018-11-04 20:00:00
62 40923 SHJ656 3 2018-11-04 20:00:00 2018-11-04 20:10:30
63 40924 SHJ656 0 2018-11-04 20:10:30 2018-11-04 21:00:00
64 40926 SHJ656 3 2018-11-04 21:00:00 2018-11-04 21:48:59
65 40928 SHJ656 0 2018-11-04 21:48:59 2018-11-04 22:00:00
66 40929 SHJ656 3 2018-11-04 22:00:00 2018-11-04 22:19:47
67 40930 SHJ656 0 2018-11-04 22:19:47 2018-11-04 22:30:00
68 40931 SHJ656 3 2018-11-04 22:30:00 2018-11-04 22:49:15
69 40932 SHJ656 0 2018-11-04 22:49:15 2018-11-05 04:15:00
70 40935 SHJ656 3 2018-11-05 04:15:00 2018-11-05 04:16:08
71 40936 SHJ656 0 2018-11-05 04:16:08 2018-11-05 04:30:00
72 40937 SHJ656 3 2018-11-05 04:30:00 2018-11-05 04:32:56
73 40938 SHJ656 0 2018-11-05 04:32:56 2018-11-05 05:30:00
74 40940 SHJ656 3 2018-11-05 05:30:00 2018-11-05 05:37:06
75 40941 SHJ656 0 2018-11-05 05:37:06 2018-11-05 06:15:00
76 40942 SHJ656 3 2018-11-05 06:15:00 2018-11-05 07:37:07
77 40943 SHJ656 0 2018-11-05 07:37:07 2018-11-05 08:00:00
78 40944 SHJ656 3 2018-11-05 08:00:00 2018-11-05 08:56:24
79 40945 SHJ656 0 2018-11-05 08:56:24 2018-11-05 09:15:00
80 40948 SHJ656 3 2018-11-05 09:15:00 2018-11-05 10:50:37
81 40950 SHJ656 0 2018-11-05 10:50:37 2018-11-05 11:15:00
82 40955 SHJ656 3 2018-11-05 11:15:00 2018-11-05 17:13:33
83 40973 SHJ656 0 2018-11-05 17:13:33 2018-11-05 17:45:00
84 40974 SHJ656 3 2018-11-05 17:45:00 2018-11-05 18:01:47
85 40975 SHJ656 0 2018-11-05 18:01:47 2018-11-05 18:15:00
86 40976 SHJ656 3 2018-11-05 18:15:00 2018-11-05 18:17:46
87 40977 SHJ656 0 2018-11-05 18:17:46 2018-11-05 18:30:00
88 40978 SHJ656 3 2018-11-05 18:30:00 2018-11-05 18:51:29
89 40979 SHJ656 0 2018-11-05 18:51:29 2018-11-05 19:30:00
90 40980 SHJ656 3 2018-11-05 19:30:00 2018-11-05 19:31:58
91 40981 SHJ656 0 2018-11-05 19:31:58 2018-11-05 20:00:00
92 40982 SHJ656 3 2018-11-05 20:00:00 2018-11-05 20:00:19
93 40983 SHJ656 0 2018-11-05 20:00:19 2018-11-05 20:15:00
94 40984 SHJ656 3 2018-11-05 20:15:00 2018-11-05 20:24:21
95 40985 SHJ656 0 2018-11-05 20:24:21 2018-11-06 02:30:00
96 40990 SHJ656 3 2018-11-06 02:30:00 2018-11-06 02:38:25
97 40991 SHJ656 0 2018-11-06 02:38:25 2018-11-06 03:15:00
98 40992 SHJ656 3 2018-11-06 03:15:00 2018-11-06 03:15:12
99 40993 SHJ656 0 2018-11-06 03:15:12 2018-11-06 03:45:00
device_id year month day dow uptimeSec downtimeSec
AUH888 2018 10 22 Monday 36836 49564
SHJ656 2018 10 24 Wednesday 44979 41421
AUH888 2018 10 25 Thursday 56872 29528
SHJ656 2018 10 29 Monday 38070 48330
cleandataHeader = ['device_id', 'year', 'month', 'day', 'dow', 'uptimeSec', 'downtimeSec']
def fragmentCollect(daystart, dayend, device):
maskBigFrag = ((historicaldata['Device_id'] == device) & ((daystart < historicaldata['Start_date']) & (dayend > historicaldata['End_date'])))
BigFragdf = historicaldata.loc[maskBigFrag]
BigFragdf['fragment'] = (BigFragdf['End_date'] - BigFragdf['Start_date']).dt.total_seconds()
maskSmallFrag = ((historicaldata['Device_id'] == device) & ((daystart > historicaldata['Start_date']) & (dayend < historicaldata['End_date'])))
SmallFragdf = historicaldata.loc[maskSmallFrag]
SmallFragdf['fragment'] = (dayend - daystart).total_seconds()
SmallFragdf['Start_date'] = daystart.strftime('%Y-%m-%d 00:00:00')
SmallFragdf['End_date'] = dayend.strftime('%Y-%m-%d 00:00:00')
maskHeadFrag = ((historicaldata['Device_id'] == device) & ((daystart >= historicaldata['Start_date'] ) & (daystart < historicaldata['End_date'] ) & (dayend > historicaldata['End_date'] )))
HeadFragdf = historicaldata.loc[maskHeadFrag]
HeadFragdf['fragment'] = (HeadFragdf['End_date'] - daystart).dt.total_seconds()
HeadFragdf['Start_date'] = daystart.strftime('%Y-%m-%d 00:00:00')
maskTailFrag = ((historicaldata['Device_id'] == device) & ((daystart < historicaldata['Start_date'] ) & (dayend <= historicaldata['End_date'] ) & (dayend > historicaldata['Start_date'] )))
TailFragdf = historicaldata.loc[maskTailFrag]
TailFragdf['fragment'] = (dayend - TailFragdf['Start_date']).dt.total_seconds()
TailFragdf['End_date'] = dayend.strftime('%Y-%m-%d 00:00:00')
frames = [BigFragdf, SmallFragdf, HeadFragdf, TailFragdf]
result = pd.concat(frames)
result = result.drop_duplicates()
return result
def rowClean(row):
row['Player Name'] = row.name[1]
row['year'] = row.name[0].year
row['month'] = row.name[0].month
row['day'] = row.name[0].day
row['dow'] = row.name[0].day_name()
result = fragmentCollect(row.name[0], row.name[0] + timedelta(days=1), row.name[1])
result = result.to_dict('records')
uptime = 0
downtime = 0
for frag in result:
if frag['status'] == 'Online':
uptime += frag['fragment']
else:
downtime += frag['fragment']
row['uptimeSec'] = uptime
row['downtimeSec'] = downtime
return row
def buildTheCleanData(start, end):
datelist = [start + timedelta(days=x) for x in range((end-start).days + 1)]
iterables = [datelist,['AUH888', 'SHJ656']
Index = pd.MultiIndex.from_product(iterables, names=['date', 'Device_id'])
s = pd.DataFrame(columns = cleandataHeader, index = Index)
s = s.apply(rowClean, axis=1)
return s
答案 0 :(得分:0)
在这里可以做的是首先拆分跨越不同日期的行,以确保每一行属于一天。要拆分的行是Start_date和End_date的日期(天)不同的行。然后,第一行将在第一天结束时结束,并为[Start_date,End_date)范围中的每一天准备额外的行。
完成此操作后,就可以轻松为每行添加日期以及正常运行时间和停机时间。将每对Device_id的持续时间相加后,得出我们预期的结果。
代码可能是:
# list of days to split
new_days = df[df.Start_date.dt.date != df.End_date.dt.date].copy()
# initialize Start_date to the end of the initial days for the new days to add
new_days['Start_date'] = (new_days.Start_date + pd.offsets.Day()).dt.floor('D')
# in original dataframe, End_date is at end of day of Start_date
df.loc[new_days.index, 'End_date'] = new_days['Start_date']
# number of full days to add
days_to_add = (new_days.End_date - new_days.Start_date).dt.floor('D').dt.days
# build a list of those days
splitted = []
for i, row in new_days.iterrows():
r = row.copy()
for i in range(days_to_add.loc[i]):
r['End_date'] = r['Start_date'] + pd.offsets.Day()
splitted.append(r.copy())
r['Start_date'] = r['End_date']
# last to add will have Start_date at the beginning of day of End_date
new_days['Start_date'] = new_days['End_date'].dt.floor('D')
# add those new days to the original dataframe (in an intermediate dataframe)
interm = pd.concat([df, new_days[new_days.Start_date!=new_days.End_date],
pd.DataFrame(splitted)]).rename_axis('ix').reset_index()
# interm = interm.sort_values(['ix', 'Start_date']).set_index('ix')
# interm.rename_axis('', inplace=True)
# add required columns to the intermediate dataframe
interm["year"] = interm.Start_date.dt.year
interm["month"] = interm.Start_date.dt.month
interm["day"] = interm.Start_date.dt.day
interm["dow"] = interm.Start_date.dt.strftime('%A')
interm['uptimeSec'] = np.where(interm.Status == 0,
(interm.End_date - interm.Start_date).dt.seconds
+ 86400 * (interm.End_date - interm.Start_date
).dt.days,
0)
interm['downtimeSec'] = np.where(interm.Status != 0,
(interm.End_date - interm.Start_date).dt.seconds
+ 86400 * (interm.End_date - interm.Start_date
).dt.days,
0)
# sum the durations
reshaped = interm[['Device_id', 'year', 'month', 'day', 'dow', 'uptimeSec',
'downtimeSec']].groupby(['Device_id', 'year', 'month',
'day', 'dow']).sum().reset_index()
它给出了预期的结果:
Device_id year month day dow uptimeSec downtimeSec
AUH888 2018 10 22 Monday 6034 49564
AUH888 2018 10 23 Tuesday 1199 85201
AUH888 2018 10 24 Wednesday 0 86400
AUH888 2018 10 25 Thursday 56872 29528
AUH888 2018 10 26 Friday 86400 0
AUH888 2018 10 27 Saturday 86400 0
AUH888 2018 10 28 Sunday 86400 0
AUH888 2018 10 29 Monday 86400 0
答案 1 :(得分:0)
我偶然发现了同样的问题,我不得不以15分钟的时间范围而不是1天的时间来计算相似的结果。我试图将解决方案应用于您的问题。
1)如Serge Ballesta所述,在您的情况下,将行分成固定的时间单位很重要。我的方法是将开始时间和结束时间转换为开始时间和持续时间。
我尝试了此代码,并在最后添加了您的数据片段。
df = pd.DataFrame(data, columns=cols) # data and cols as defined at end of my post
# Convert relevant entries to datetime format
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])
# Set start_date as index
df.set_index('Start_date', inplace=True)
# Add row with duration, based on start and end-time
df['duration'] = pd.to_timedelta(df['End_date'] - df.index)
我每天创建一个新的dataFrame(您可以适当设置时间)
# Online is where status is 0, offline is where status is {1,2,3}
# Step-by-step-explanation:
# df[df['Status'] == '0'] -> filters only for entries wit hstatus 'online'
# df[df['Status'] == '0']['duration'] -> of the filtered entries, select the duration row
# df[df['Status'] == '0']['duration'].resample('1d') -> resample to daily bins
# df[df['Status'] == '0']['duration'].resample('1d').sum() -> sum 'duration' per bin
#(df[df['Status'] == '0']['duration'].resample('1d').sum().astype(np.int64)/1e9) -> convert summed duration to seconds (convertion from nanoseconds!)
online = (df[df['Status'] == '0']['duration'].resample('1d').sum().astype(np.int64)/1e9)
offline = (df[(df['Status'] == '1') | (df['Status'] == '2') | (df['Status'] == '3')]['duration'].resample('1d').sum().astype(np.int64)/1e9)
index = online.index
# The new dataframe is put together
new_sort = pd.DataFrame(index=index)
new_sort['online'] = online
new_sort['offline']= offline
new_sort.fillna(0, inplace=True)
由于持续时间可能会超过一天,因此您的持续时间条目可能需要超过一天的时间。我用一个函数修复了这个问题。此功能会限制时间,如果有“多余”时间,则会将其转移到第二天。
def split_duration(df, col, dt):
diff = 0
for e,row in df.iterrows():
if diff > 0:
row[col] += diff
diff = 0
if row[col]>dt:
diff = row[col] - dt
row[col] = dt
return df
dt = 60*60*24 # time unit is in seconds, dt is seconds per day
df = split_duration(new_sort, 'online', dt)
df = split_duration(new_sort, 'offline', dt)
生成的数据框new_sort为您提供所需的所有信息。
new_sort
online offline
Start_date
2018-10-22 6034.0 86400.0
2018-10-23 1199.0 86400.0
2018-10-24 0.0 76175.0
2018-10-25 86400.0 1718.0
2018-10-26 86400.0 0.0
2018-10-27 86400.0 0.0
2018-10-28 86400.0 0.0
2018-10-29 86400.0 0.0
2018-10-30 86400.0 0.0
此代码可能仍然不是最佳代码,欢迎进行改进。 这里只是我与此代码一起使用的数据框。
data = np.array([[40162, 'AUH888', 1, '2018-10-22 08:33:22', '2018-10-22 08:34:26'],
[40163, 'AUH888', 0, '2018-10-22 08:34:26', '2018-10-22 10:15:00'],
[40167, 'AUH888', 3, '2018-10-22 10:15:00', '2018-10-23 12:40:01'],
[40224, 'AUH888', 0, '2018-10-23 12:40:01', ' 2018-10-23 13:00:00'],
[40227, 'AUH888', 3, '2018-10-23 13:00:00', ' 2018-10-25 07:43:30'],
[40296, 'AUH888', 0, '2018-10-25 07:43:30', ' 2018-10-25 08:00:00'],
[40298, 'AUH888', 3, '2018-10-25 08:00:00', ' 2018-10-25 08:28:38'],
[40301, 'AUH888', 0, '2018-10-25 08:28:38', ' 2018-11-05 12:15:00'],
[40965, 'AUH888', 3, '2018-11-05 12:15:00', ' 2018-11-07 08:06:58'],
[41085, 'AUH888', 0, '2018-11-07 08:06:58', ' 2018-11-12 07:15:00'],
[41256, 'AUH888', 3, '2018-11-12 07:15:00', ' 2018-11-12 07:19:29'],
[41257, 'AUH888', 0, '2018-11-12 07:19:29', ' 2018-11-15 10:45:00'],
[41412, 'AUH888', 3, '2018-11-15 10:45:00', ' 2018-11-17 09:38:42'],
[41469, 'AUH888', 0, '2018-11-17 09:38:42', ' 2018-11-19 10:15:00']])
cols = ['History_id', 'Device_id','Status', 'Start_date', 'End_date']
df = pd.DataFrame(data, columns=cols)
答案 2 :(得分:0)
我的观点:
import pandas as pd
from datetime import timedelta
def days_distance(start, end):
A = start.replace(hour=0, minute=0, second=0, microsecond=0)
B = end.replace(hour=0, minute=0, second=0, microsecond=0)
return (B - A).days
def duration(sdate, start=True):
d_start = sdate.replace(hour=0, minute=0, second=0, microsecond=0)
dur = (sdate - d_start).seconds
return 86400 - dur if start else dur
def data_arr(tm, vl):
return tm.year, tm.month, tm.day, tm.strftime("%A"), vl, 86400 - vl
def durs(sdate, edate):
res = []
if sdate.strftime('%Y-%m-%d') == edate.strftime('%Y-%m-%d'):
res.append(data_arr(sdate, (edate - sdate).seconds))
else:
res.append(data_arr(sdate, duration(sdate)))
res.append(data_arr(edate, duration(edate, start=False)))
ddist = days_distance(sdate, edate)
for i in range(1, ddist):
res.append(data_arr(sdate + timedelta(days=i), 86400))
return res
def expand(data_series):
durations = durs(data_series.Start_date, data_series.End_date)
tdf = pd.DataFrame(
durations,
columns=['year', 'month', 'day', 'dow', 'uptimeSec', 'downtimeSec'])
tdf.insert(
loc=0, column='Device_id', value=[data_series.Device_id]*len(tdf))
return tdf
# Load source csv
df = pd.read_csv('data_files/device_statuses.csv')
df.Start_date = pd.to_datetime(df.Start_date)
df.End_date = pd.to_datetime(df.End_date)
# Keep uptime data only:
df = df[df.Status == 0]
print(df.head(), '\n\n----------\n')
# Get desired result
dfs = df.apply(lambda row: expand(row), axis=1)
result = pd.concat(dfs.to_list(), axis=0, ignore_index=True)
result.sort_values(by=['year', 'month', 'day', 'dow'])
print(result.head())
输出:
History_id Device_id Status Start_date End_date
1 40163 AUH888 0 2018-10-22 08:34:26 2018-10-22 10:15:00
3 40224 AUH888 0 2018-10-23 12:40:01 2018-10-23 13:00:00
5 40296 AUH888 0 2018-10-25 07:43:30 2018-10-25 08:00:00
7 40301 AUH888 0 2018-10-25 08:28:38 2018-11-05 12:15:00
9 41085 AUH888 0 2018-11-07 08:06:58 2018-11-12 07:15:00
----------
Device_id year month day dow uptimeSec downtimeSec
0 AUH888 2018 10 22 Monday 6034 80366
1 AUH888 2018 10 23 Tuesday 1199 85201
2 AUH888 2018 10 25 Thursday 990 85410
3 AUH888 2018 10 25 Thursday 55882 30518
4 AUH888 2018 11 5 Monday 44100 42300
csv:
History_id,Device_id,Status,Start_date,End_date
40162,AUH888,1,2018-10-22 08:33:22,2018-10-22 08:34:26
40163,AUH888,0,2018-10-22 08:34:26,2018-10-22 10:15:00
40167,AUH888,3,2018-10-22 10:15:00,2018-10-23 12:40:01
40224,AUH888,0,2018-10-23 12:40:01,2018-10-23 13:00:00
40227,AUH888,3,2018-10-23 13:00:00,2018-10-25 07:43:30
40296,AUH888,0,2018-10-25 07:43:30,2018-10-25 08:00:00
40298,AUH888,3,2018-10-25 08:00:00,2018-10-25 08:28:38
40301,AUH888,0,2018-10-25 08:28:38,2018-11-05 12:15:00
40965,AUH888,3,2018-11-05 12:15:00,2018-11-07 08:06:58
41085,AUH888,0,2018-11-07 08:06:58,2018-11-12 07:15:00
41256,AUH888,3,2018-11-12 07:15:00,2018-11-12 07:19:29
41257,AUH888,0,2018-11-12 07:19:29,2018-11-15 10:45:00
41412,AUH888,3,2018-11-15 10:45:00,2018-11-17 09:38:42
41469,AUH888,0,2018-11-17 09:38:42,2018-11-19 10:15:00
41555,AUH888,3,2018-11-19 10:15:00,2018-11-20 05:21:19
...