将熊猫数据框(开始日期,结束日期)重塑为每日和每小时列

时间:2019-06-06 09:03:33

标签: python pandas dataframe time-series data-analysis

  • 我的数据集如下,其中包含设备的状态信息(在线,离线,...):
  • 以下日期框架名称为historydata(仅作为示例)
   History_id Device_id  Status          Start_date            End_date
0      40162     AUH888       1 2018-10-22 08:33:22 2018-10-22 08:34:26
1      40163     AUH888       0 2018-10-22 08:34:26 2018-10-22 10:15:00
2      40167     AUH888       3 2018-10-22 10:15:00 2018-10-23 12:40:01
3      40224     AUH888       0 2018-10-23 12:40:01 2018-10-23 13:00:00
4      40227     AUH888       3 2018-10-23 13:00:00 2018-10-25 07:43:30
5      40296     AUH888       0 2018-10-25 07:43:30 2018-10-25 08:00:00
6      40298     AUH888       3 2018-10-25 08:00:00 2018-10-25 08:28:38
7      40301     AUH888       0 2018-10-25 08:28:38 2018-11-05 12:15:00
8      40965     AUH888       3 2018-11-05 12:15:00 2018-11-07 08:06:58
9      41085     AUH888       0 2018-11-07 08:06:58 2018-11-12 07:15:00
10     41256     AUH888       3 2018-11-12 07:15:00 2018-11-12 07:19:29
11     41257     AUH888       0 2018-11-12 07:19:29 2018-11-15 10:45:00
12     41412     AUH888       3 2018-11-15 10:45:00 2018-11-17 09:38:42
13     41469     AUH888       0 2018-11-17 09:38:42 2018-11-19 10:15:00
14     41555     AUH888       3 2018-11-19 10:15:00 2018-11-20 05:21:19
15     41581     AUH888       0 2018-11-20 05:21:19 2018-11-20 05:45:00
16     41582     AUH888       3 2018-11-20 05:45:00 2018-11-20 10:32:37
17     41594     AUH888       0 2018-11-20 10:32:37 2018-11-27 00:45:00
18     41856     AUH888       3 2018-11-27 00:45:00 2018-11-27 02:57:24
19     41858     AUH888       0 2018-11-27 02:57:24 2018-11-27 08:15:00
20     41877     AUH888       3 2018-11-27 08:15:00 2018-11-27 08:16:16
21     41878     AUH888       0 2018-11-27 08:16:16 2018-11-27 15:00:00
22     41900     AUH888       3 2018-11-27 15:00:00 2018-11-27 17:55:37
23     41902     AUH888       0 2018-11-27 17:55:37 2018-12-07 12:15:00
24     42301     AUH888       3 2018-12-07 12:15:00 2018-12-07 12:21:48
25     42302     AUH888       0 2018-12-07 12:21:48 2018-12-12 07:30:00
26     42518     AUH888       3 2018-12-12 07:30:00 2018-12-12 11:42:39
27     42542     AUH888       0 2018-12-12 11:42:39 2018-12-27 10:00:00
28     43319     AUH888       3 2018-12-27 10:00:00 2018-12-27 10:06:39
29     43320     AUH888       0 2018-12-27 10:06:39 2018-12-30 07:30:00
30     43437     AUH888       3 2018-12-30 07:30:00 2018-12-30 07:42:18
31     43438     AUH888       0 2018-12-30 07:42:18 2018-12-30 10:00:00
32     43445     AUH888       3 2018-12-30 10:00:00 2018-12-30 14:09:08
33     43447     AUH888       0 2018-12-30 14:09:08 2019-01-03 12:15:00
34     43566     AUH888       3 2019-01-03 12:15:00 2019-01-03 14:57:34
35     43572     AUH888       0 2019-01-03 14:57:34 2019-01-06 06:45:00
36     43656     AUH888       3 2019-01-06 06:45:00 2019-01-06 12:09:59
37     43677     AUH888       0 2019-01-06 12:09:59 2019-01-09 08:45:00
38     43835     AUH888       3 2019-01-09 08:45:00 2019-01-09 09:11:15
39     43837     AUH888       0 2019-01-09 09:11:15 2019-02-09 15:00:00
40     44866     AUH888       3 2019-02-09 15:00:00 2019-02-09 15:25:45
41     44867     AUH888       0 2019-02-09 15:25:45 2019-02-11 08:00:00
42     44956     AUH888       3 2019-02-11 08:00:00 2019-02-12 16:20:42
43     45139     AUH888       0 2019-02-12 16:20:42 2019-02-12 16:45:06
44     45142     AUH888       3 2019-02-12 16:45:06 2019-02-12 17:08:52
45     45146     AUH888       0 2019-02-12 17:08:52 2019-02-12 17:30:00
46     45154     AUH888       3 2019-02-12 17:30:00 2019-02-12 18:32:14
47     45177     AUH888       0 2019-02-12 18:32:14 2019-02-12 18:45:00
48     45179     AUH888       3 2019-02-12 18:45:00 2019-02-12 19:36:39
49     45186     AUH888       0 2019-02-12 19:36:39 2019-02-12 20:00:00
50     40905     SHJ656       3 2018-11-04 14:00:00 2018-11-04 14:38:06
51     40906     SHJ656       0 2018-11-04 14:38:06 2018-11-04 15:00:00
52     40908     SHJ656       3 2018-11-04 15:00:00 2018-11-04 15:14:46
53     40909     SHJ656       0 2018-11-04 15:14:46 2018-11-04 16:15:00
54     40911     SHJ656       3 2018-11-04 16:15:00 2018-11-04 17:14:25
55     40913     SHJ656       0 2018-11-04 17:14:25 2018-11-04 17:45:00
56     40914     SHJ656       3 2018-11-04 17:45:00 2018-11-04 18:08:18
57     40915     SHJ656       0 2018-11-04 18:08:18 2018-11-04 18:30:00
58     40916     SHJ656       3 2018-11-04 18:30:00 2018-11-04 19:30:23
59     40920     SHJ656       0 2018-11-04 19:30:23 2018-11-04 19:45:00
60     40921     SHJ656       3 2018-11-04 19:45:00 2018-11-04 19:48:24
61     40922     SHJ656       0 2018-11-04 19:48:24 2018-11-04 20:00:00
62     40923     SHJ656       3 2018-11-04 20:00:00 2018-11-04 20:10:30
63     40924     SHJ656       0 2018-11-04 20:10:30 2018-11-04 21:00:00
64     40926     SHJ656       3 2018-11-04 21:00:00 2018-11-04 21:48:59
65     40928     SHJ656       0 2018-11-04 21:48:59 2018-11-04 22:00:00
66     40929     SHJ656       3 2018-11-04 22:00:00 2018-11-04 22:19:47
67     40930     SHJ656       0 2018-11-04 22:19:47 2018-11-04 22:30:00
68     40931     SHJ656       3 2018-11-04 22:30:00 2018-11-04 22:49:15
69     40932     SHJ656       0 2018-11-04 22:49:15 2018-11-05 04:15:00
70     40935     SHJ656       3 2018-11-05 04:15:00 2018-11-05 04:16:08
71     40936     SHJ656       0 2018-11-05 04:16:08 2018-11-05 04:30:00
72     40937     SHJ656       3 2018-11-05 04:30:00 2018-11-05 04:32:56
73     40938     SHJ656       0 2018-11-05 04:32:56 2018-11-05 05:30:00
74     40940     SHJ656       3 2018-11-05 05:30:00 2018-11-05 05:37:06
75     40941     SHJ656       0 2018-11-05 05:37:06 2018-11-05 06:15:00
76     40942     SHJ656       3 2018-11-05 06:15:00 2018-11-05 07:37:07
77     40943     SHJ656       0 2018-11-05 07:37:07 2018-11-05 08:00:00
78     40944     SHJ656       3 2018-11-05 08:00:00 2018-11-05 08:56:24
79     40945     SHJ656       0 2018-11-05 08:56:24 2018-11-05 09:15:00
80     40948     SHJ656       3 2018-11-05 09:15:00 2018-11-05 10:50:37
81     40950     SHJ656       0 2018-11-05 10:50:37 2018-11-05 11:15:00
82     40955     SHJ656       3 2018-11-05 11:15:00 2018-11-05 17:13:33
83     40973     SHJ656       0 2018-11-05 17:13:33 2018-11-05 17:45:00
84     40974     SHJ656       3 2018-11-05 17:45:00 2018-11-05 18:01:47
85     40975     SHJ656       0 2018-11-05 18:01:47 2018-11-05 18:15:00
86     40976     SHJ656       3 2018-11-05 18:15:00 2018-11-05 18:17:46
87     40977     SHJ656       0 2018-11-05 18:17:46 2018-11-05 18:30:00
88     40978     SHJ656       3 2018-11-05 18:30:00 2018-11-05 18:51:29
89     40979     SHJ656       0 2018-11-05 18:51:29 2018-11-05 19:30:00
90     40980     SHJ656       3 2018-11-05 19:30:00 2018-11-05 19:31:58
91     40981     SHJ656       0 2018-11-05 19:31:58 2018-11-05 20:00:00
92     40982     SHJ656       3 2018-11-05 20:00:00 2018-11-05 20:00:19
93     40983     SHJ656       0 2018-11-05 20:00:19 2018-11-05 20:15:00
94     40984     SHJ656       3 2018-11-05 20:15:00 2018-11-05 20:24:21
95     40985     SHJ656       0 2018-11-05 20:24:21 2018-11-06 02:30:00
96     40990     SHJ656       3 2018-11-06 02:30:00 2018-11-06 02:38:25
97     40991     SHJ656       0 2018-11-06 02:38:25 2018-11-06 03:15:00
98     40992     SHJ656       3 2018-11-06 03:15:00 2018-11-06 03:15:12
99     40993     SHJ656       0 2018-11-06 03:15:12 2018-11-06 03:45:00
  • 状态列包含四个状态(在线,离线,失败,通讯丢失)。
  • 每个设备都在流式传输状态数据以及每个状态的开始和结束日期
  • 每个状态可能需要几秒钟,几分钟,几天甚至几个月的时间,因此每天或每小时都很难可视化数据。
  • 我需要分析状态以显示每天和每小时每个状态的可用性,以检测状态的波动(如果存在)。
  • 我的目标是将日期框架重塑为新的每日数据框架,如下所示:
device_id   year    month   day dow   uptimeSec downtimeSec
AUH888          2018    10  22  Monday    36836         49564
SHJ656          2018    10  24  Wednesday 44979         41421
AUH888          2018    10  25  Thursday  56872         29528
SHJ656          2018    10  29  Monday    38070         48330
  • 正常运行时间>>处于在线状态时
  • 停机时间>>当状态为(离线,失败,通讯丢失)
  • 我正在使用以下代码,但是它有点慢。
cleandataHeader = ['device_id', 'year', 'month', 'day', 'dow', 'uptimeSec', 'downtimeSec']

def fragmentCollect(daystart, dayend, device):
    maskBigFrag = ((historicaldata['Device_id'] == device) & ((daystart < historicaldata['Start_date']) & (dayend > historicaldata['End_date'])))
    BigFragdf = historicaldata.loc[maskBigFrag]
    BigFragdf['fragment'] = (BigFragdf['End_date'] - BigFragdf['Start_date']).dt.total_seconds()

    maskSmallFrag = ((historicaldata['Device_id'] == device) & ((daystart > historicaldata['Start_date']) & (dayend < historicaldata['End_date'])))
    SmallFragdf = historicaldata.loc[maskSmallFrag]
    SmallFragdf['fragment'] = (dayend - daystart).total_seconds()
    SmallFragdf['Start_date'] = daystart.strftime('%Y-%m-%d 00:00:00')
    SmallFragdf['End_date'] = dayend.strftime('%Y-%m-%d 00:00:00')

    maskHeadFrag = ((historicaldata['Device_id'] == device) & ((daystart >= historicaldata['Start_date'] ) & (daystart < historicaldata['End_date'] ) & (dayend > historicaldata['End_date'] )))
    HeadFragdf = historicaldata.loc[maskHeadFrag]
    HeadFragdf['fragment'] = (HeadFragdf['End_date'] - daystart).dt.total_seconds()
    HeadFragdf['Start_date'] = daystart.strftime('%Y-%m-%d 00:00:00')

    maskTailFrag = ((historicaldata['Device_id'] == device) & ((daystart < historicaldata['Start_date'] ) & (dayend <= historicaldata['End_date'] ) & (dayend > historicaldata['Start_date'] )))
    TailFragdf = historicaldata.loc[maskTailFrag]
    TailFragdf['fragment'] = (dayend - TailFragdf['Start_date']).dt.total_seconds()
    TailFragdf['End_date'] = dayend.strftime('%Y-%m-%d 00:00:00')

    frames = [BigFragdf, SmallFragdf, HeadFragdf, TailFragdf]
    result = pd.concat(frames)
    result = result.drop_duplicates()
    return result

def rowClean(row):
    row['Player Name'] = row.name[1]
    row['year'] = row.name[0].year
    row['month'] = row.name[0].month
    row['day'] = row.name[0].day
    row['dow'] = row.name[0].day_name()
    result = fragmentCollect(row.name[0], row.name[0] + timedelta(days=1), row.name[1])
    result = result.to_dict('records')
    uptime = 0
    downtime = 0
    for frag in result:
        if frag['status'] == 'Online':
            uptime += frag['fragment']
        else:
            downtime += frag['fragment']
    row['uptimeSec'] = uptime 
    row['downtimeSec'] = downtime
    return row

def buildTheCleanData(start, end):
    datelist = [start + timedelta(days=x) for x in range((end-start).days + 1)]
    iterables = [datelist,['AUH888', 'SHJ656']
    Index = pd.MultiIndex.from_product(iterables, names=['date', 'Device_id'])
    s = pd.DataFrame(columns = cleandataHeader, index = Index)
    s = s.apply(rowClean, axis=1)
    return s
  • functionfragmentCollect是我的算法,用于收集每个通过的起点,终点和设备的状态片断。
  • rowClean函数将应用于新的多索引数据框中的每一行以填充信息(uptimesec,downtimesec)。
  • buildTheCleanData函数用于按天构建可用性状态的新的干净数据框。
  • 如果我想要按小时整洁的数据帧,我可以应用相同的概念。
  • 您可以看到我上面的代码很慢。
  • 我想知道pandas函数中是否有内置功能可以更快地处理类似情况。

3 个答案:

答案 0 :(得分:0)

在这里可以做的是首先拆分跨越不同日期的行,以确保每一行属于一天。要拆分的行是Start_date和End_date的日期(天)不同的行。然后,第一行将在第一天结束时结束,并为[Start_date,End_date)范围中的每一天准备额外的行。

完成此操作后,就可以轻松为每行添加日期以及正常运行时间和停机时间。将每对Device_id的持续时间相加后,得出我们预期的结果。

代码可能是:

# list of days to split
new_days = df[df.Start_date.dt.date != df.End_date.dt.date].copy()

# initialize Start_date to the end of the initial days for the new days to add
new_days['Start_date'] = (new_days.Start_date + pd.offsets.Day()).dt.floor('D')

# in original dataframe, End_date is at end of day of Start_date
df.loc[new_days.index, 'End_date'] = new_days['Start_date']

# number of full days to add
days_to_add = (new_days.End_date - new_days.Start_date).dt.floor('D').dt.days

# build a list of those days
splitted = []
for i, row in new_days.iterrows():
    r = row.copy()
    for i in range(days_to_add.loc[i]):
        r['End_date'] = r['Start_date'] + pd.offsets.Day()
        splitted.append(r.copy())
        r['Start_date'] = r['End_date']

# last to add will have Start_date at the beginning of day of End_date
new_days['Start_date'] = new_days['End_date'].dt.floor('D')

# add those new days to the original dataframe (in an intermediate dataframe)    
interm = pd.concat([df, new_days[new_days.Start_date!=new_days.End_date],
                   pd.DataFrame(splitted)]).rename_axis('ix').reset_index()
# interm = interm.sort_values(['ix', 'Start_date']).set_index('ix')
# interm.rename_axis('', inplace=True)

# add required columns to the intermediate dataframe
interm["year"] = interm.Start_date.dt.year
interm["month"] = interm.Start_date.dt.month
interm["day"] = interm.Start_date.dt.day
interm["dow"] = interm.Start_date.dt.strftime('%A')
interm['uptimeSec'] = np.where(interm.Status == 0,
                              (interm.End_date - interm.Start_date).dt.seconds
                              + 86400 * (interm.End_date - interm.Start_date
                                         ).dt.days,
                              0)
interm['downtimeSec'] = np.where(interm.Status != 0,
                              (interm.End_date - interm.Start_date).dt.seconds
                              + 86400 * (interm.End_date - interm.Start_date
                                         ).dt.days,
                              0)

# sum the durations
reshaped = interm[['Device_id', 'year', 'month', 'day', 'dow', 'uptimeSec',
                   'downtimeSec']].groupby(['Device_id', 'year', 'month',
                                            'day', 'dow']).sum().reset_index()

它给出了预期的结果:

Device_id  year  month  day        dow  uptimeSec  downtimeSec
  AUH888  2018     10   22     Monday       6034        49564
  AUH888  2018     10   23    Tuesday       1199        85201
  AUH888  2018     10   24  Wednesday          0        86400
  AUH888  2018     10   25   Thursday      56872        29528
  AUH888  2018     10   26     Friday      86400            0
  AUH888  2018     10   27   Saturday      86400            0
  AUH888  2018     10   28     Sunday      86400            0
  AUH888  2018     10   29     Monday      86400            0

答案 1 :(得分:0)

我偶然发现了同样的问题,我不得不以15分钟的时间范围而不是1天的时间来计算相似的结果。我试图将解决方案应用于您的问题。

1)如Serge Ballesta所述,在您的情况下,将行分成固定的时间单位很重要。我的方法是将开始时间和结束时间转换为开始时间和持续时间。

我尝试了此代码,并在最后添加了您的数据片段。

df = pd.DataFrame(data, columns=cols) # data and cols as defined at end of my post

# Convert relevant entries to datetime format
df['Start_date'] = pd.to_datetime(df['Start_date'])
df['End_date'] = pd.to_datetime(df['End_date'])

# Set start_date as index
df.set_index('Start_date', inplace=True)

# Add row with duration, based on start and end-time
df['duration'] = pd.to_timedelta(df['End_date'] - df.index)

我每天创建一个新的dataFrame(您可以适当设置时间)

# Online is where status is 0, offline is where status is {1,2,3}
# Step-by-step-explanation:
# df[df['Status'] == '0'] -> filters only for entries wit hstatus 'online'
# df[df['Status'] == '0']['duration'] -> of the filtered entries, select the duration row
# df[df['Status'] == '0']['duration'].resample('1d') -> resample to daily bins
# df[df['Status'] == '0']['duration'].resample('1d').sum() -> sum 'duration' per bin
#(df[df['Status'] == '0']['duration'].resample('1d').sum().astype(np.int64)/1e9) -> convert summed duration to seconds (convertion from nanoseconds!)
online = (df[df['Status'] == '0']['duration'].resample('1d').sum().astype(np.int64)/1e9)
offline = (df[(df['Status'] == '1') | (df['Status'] == '2') | (df['Status'] == '3')]['duration'].resample('1d').sum().astype(np.int64)/1e9)
index = online.index

# The new dataframe is put together
new_sort = pd.DataFrame(index=index)
new_sort['online'] = online
new_sort['offline']= offline
new_sort.fillna(0, inplace=True)

由于持续时间可能会超过一天,因此您的持续时间条目可能需要超过一天的时间。我用一个函数修复了这个问题。此功能会限制时间,如果有“多余”时间,则会将其转移到第二天。

def split_duration(df, col, dt):
    diff = 0
    for e,row in df.iterrows():
        if diff > 0:
            row[col] += diff
            diff = 0

        if row[col]>dt:
            diff = row[col] - dt
            row[col] = dt
    return df

dt = 60*60*24 # time unit is in seconds, dt is seconds per day
df = split_duration(new_sort, 'online', dt)
df = split_duration(new_sort, 'offline', dt)

生成的数据框new_sort为您提供所需的所有信息。

new_sort
            online  offline
Start_date      
2018-10-22  6034.0  86400.0
2018-10-23  1199.0  86400.0
2018-10-24  0.0     76175.0
2018-10-25  86400.0  1718.0
2018-10-26  86400.0     0.0
2018-10-27  86400.0     0.0
2018-10-28  86400.0     0.0
2018-10-29  86400.0     0.0
2018-10-30  86400.0     0.0

此代码可能仍然不是最佳代码,欢迎进行改进。 这里只是我与此代码一起使用的数据框。

data = np.array([[40162,     'AUH888',       1, '2018-10-22 08:33:22', '2018-10-22 08:34:26'], 
 [40163,     'AUH888',       0, '2018-10-22 08:34:26', '2018-10-22 10:15:00'], 
 [40167,     'AUH888',       3, '2018-10-22 10:15:00', '2018-10-23 12:40:01'], 
 [40224,     'AUH888',       0, '2018-10-23 12:40:01', ' 2018-10-23 13:00:00'], 
 [40227,     'AUH888',       3, '2018-10-23 13:00:00', ' 2018-10-25 07:43:30'], 
 [40296,     'AUH888',       0, '2018-10-25 07:43:30', ' 2018-10-25 08:00:00'], 
 [40298,     'AUH888',       3, '2018-10-25 08:00:00', ' 2018-10-25 08:28:38'], 
 [40301,     'AUH888',       0, '2018-10-25 08:28:38', ' 2018-11-05 12:15:00'], 
 [40965,     'AUH888',       3, '2018-11-05 12:15:00', ' 2018-11-07 08:06:58'], 
 [41085,     'AUH888',       0, '2018-11-07 08:06:58', ' 2018-11-12 07:15:00'], 
 [41256,     'AUH888',       3, '2018-11-12 07:15:00', ' 2018-11-12 07:19:29'], 
 [41257,     'AUH888',       0, '2018-11-12 07:19:29', ' 2018-11-15 10:45:00'], 
 [41412,     'AUH888',       3, '2018-11-15 10:45:00', ' 2018-11-17 09:38:42'], 
 [41469,     'AUH888',       0, '2018-11-17 09:38:42', ' 2018-11-19 10:15:00']])
cols = ['History_id', 'Device_id','Status', 'Start_date', 'End_date']
df = pd.DataFrame(data, columns=cols)

答案 2 :(得分:0)

我的观点:

import pandas as pd
from datetime import timedelta


def days_distance(start, end):
    A = start.replace(hour=0, minute=0, second=0, microsecond=0)
    B = end.replace(hour=0, minute=0, second=0, microsecond=0)
    return (B - A).days


def duration(sdate, start=True):
    d_start = sdate.replace(hour=0, minute=0, second=0, microsecond=0)
    dur = (sdate - d_start).seconds
    return 86400 - dur if start else dur


def data_arr(tm, vl):
    return tm.year, tm.month, tm.day, tm.strftime("%A"), vl, 86400 - vl


def durs(sdate, edate):
    res = []
    if sdate.strftime('%Y-%m-%d') == edate.strftime('%Y-%m-%d'):
        res.append(data_arr(sdate, (edate - sdate).seconds))
    else:
        res.append(data_arr(sdate, duration(sdate)))
        res.append(data_arr(edate, duration(edate, start=False)))
    ddist = days_distance(sdate, edate)
    for i in range(1, ddist):
        res.append(data_arr(sdate + timedelta(days=i), 86400))
    return res


def expand(data_series):
    durations = durs(data_series.Start_date, data_series.End_date)
    tdf = pd.DataFrame(
        durations,
        columns=['year', 'month', 'day', 'dow', 'uptimeSec', 'downtimeSec'])
    tdf.insert(
        loc=0, column='Device_id', value=[data_series.Device_id]*len(tdf))
    return tdf


# Load source csv
df = pd.read_csv('data_files/device_statuses.csv')
df.Start_date = pd.to_datetime(df.Start_date)
df.End_date = pd.to_datetime(df.End_date)

# Keep uptime data only:
df = df[df.Status == 0]
print(df.head(), '\n\n----------\n')

# Get desired result
dfs = df.apply(lambda row: expand(row), axis=1)
result = pd.concat(dfs.to_list(), axis=0, ignore_index=True)
result.sort_values(by=['year', 'month', 'day', 'dow'])
print(result.head())

输出:

   History_id Device_id  Status          Start_date            End_date
1       40163    AUH888       0 2018-10-22 08:34:26 2018-10-22 10:15:00
3       40224    AUH888       0 2018-10-23 12:40:01 2018-10-23 13:00:00
5       40296    AUH888       0 2018-10-25 07:43:30 2018-10-25 08:00:00
7       40301    AUH888       0 2018-10-25 08:28:38 2018-11-05 12:15:00
9       41085    AUH888       0 2018-11-07 08:06:58 2018-11-12 07:15:00 

----------

  Device_id  year  month  day       dow  uptimeSec  downtimeSec
0    AUH888  2018     10   22    Monday       6034        80366
1    AUH888  2018     10   23   Tuesday       1199        85201
2    AUH888  2018     10   25  Thursday        990        85410
3    AUH888  2018     10   25  Thursday      55882        30518
4    AUH888  2018     11    5    Monday      44100        42300

csv:

History_id,Device_id,Status,Start_date,End_date
40162,AUH888,1,2018-10-22 08:33:22,2018-10-22 08:34:26
40163,AUH888,0,2018-10-22 08:34:26,2018-10-22 10:15:00
40167,AUH888,3,2018-10-22 10:15:00,2018-10-23 12:40:01
40224,AUH888,0,2018-10-23 12:40:01,2018-10-23 13:00:00
40227,AUH888,3,2018-10-23 13:00:00,2018-10-25 07:43:30
40296,AUH888,0,2018-10-25 07:43:30,2018-10-25 08:00:00
40298,AUH888,3,2018-10-25 08:00:00,2018-10-25 08:28:38
40301,AUH888,0,2018-10-25 08:28:38,2018-11-05 12:15:00
40965,AUH888,3,2018-11-05 12:15:00,2018-11-07 08:06:58
41085,AUH888,0,2018-11-07 08:06:58,2018-11-12 07:15:00
41256,AUH888,3,2018-11-12 07:15:00,2018-11-12 07:19:29
41257,AUH888,0,2018-11-12 07:19:29,2018-11-15 10:45:00
41412,AUH888,3,2018-11-15 10:45:00,2018-11-17 09:38:42
41469,AUH888,0,2018-11-17 09:38:42,2018-11-19 10:15:00
41555,AUH888,3,2018-11-19 10:15:00,2018-11-20 05:21:19
...