如何检查熊猫中是否缺少任何字符串

时间:2018-10-18 12:08:27

标签: python pandas

我在熊猫中有以下数据框

 Date           half_hourly_bucket       Value
 2018-01-01     00:00:01 - 00:30:00      123
 2018-01-01     00:30:01 - 01:00:00      12
 2018-01-01     01:00:01 - 01:30:00      122
 2018-01-01     02:00:01 - 02:30:00      111
 2018-01-01     03:00:01 - 03:30:00      122
 2018-01-01     04:00:01 - 04:30:00      111

我想要的数据框是

 Date           half_hourly_bucket       Value
 2018-01-01     00:00:01 - 00:30:00      123
 2018-01-01     00:30:01 - 01:00:00      12
 2018-01-01     01:00:01 - 01:30:00      122
 2018-01-01     01:30:01 - 02:00:00      0
 2018-01-01     02:00:01 - 02:30:00      122
 2018-01-01     02:30:01 - 03:00:00      0
 2018-01-01     03:00:01 - 03:30:00      111
 2018-01-01     03:30:01 - 04:00:00      0
 2018-01-01     04:00:01 - 04:30:00      111
 2018-01-01     04:30:01 - 05:00:00      0
 2018-01-01     05:00:01 - 05:30:00      0
 2018-01-01     05:30:01 - 06:00:00      0
 2018-01-01     06:00:01 - 06:30:00      0
 2018-01-01     06:30:01 - 07:00:00      0
 2018-01-01     07:00:01 - 07:30:00      0
 2018-01-01     07:30:01 - 08:00:00      0
 2018-01-01     08:00:01 - 08:30:00      0
 2018-01-01     09:00:01 - 09:30:00      0
 2018-01-01     10:00:01 - 10:30:00      0
 2018-01-01     10:30:01 - 11:00:00      0
 2018-01-01     11:00:01 - 11:30:00      0
 2018-01-01     11:30:01 - 12:00:00      0
 2018-01-01     12:00:01 - 12:30:00      0
 2018-01-01     12:30:01 - 13:00:00      0
 2018-01-01     13:00:01 - 13:30:00      0
 2018-01-01     13:30:01 - 14:00:00      0
 2018-01-01     14:00:01 - 14:30:00      0
 2018-01-01     14:30:01 - 15:00:00      0
 2018-01-01     15:00:01 - 15:30:00      0
 2018-01-01     15:30:01 - 16:00:00      0
 2018-01-01     16:00:01 - 16:30:00      0
 2018-01-01     16:30:01 - 17:00:00      0
 2018-01-01     17:00:01 - 17:30:00      0
 2018-01-01     18:00:01 - 18:30:00      0
 2018-01-01     18:30:01 - 19:00:00      0
 2018-01-01     19:00:01 - 19:30:00      0
 2018-01-01     19:30:01 - 20:00:00      0
 2018-01-01     20:00:01 - 20:30:00      0
 2018-01-01     20:30:01 - 21:00:00      0
 2018-01-01     21:00:01 - 21:30:00      0
 2018-01-01     21:30:01 - 22:00:00      0
 2018-01-01     22:00:01 - 22:30:00      0
 2018-01-01     22:30:01 - 23:00:00      0
 2018-01-01     23:00:01 - 23:30:00      0
 2018-01-01     23:30:01 - 00:00:00      0

我要在Date列上检查的是,是否有任何半小时的存储桶(每天总共48个存储桶)中缺少数据,如果丢失了,则必须按顺序添加该存储桶,并且值为0。

我怎么在熊猫里做?

1 个答案:

答案 0 :(得分:4)

解决方案将half_hourly_bucket拆分为2个新列,对其进行处理并重新加入:

#create DatetimeIndex
df = df.set_index('Date')

#split to new columns
df[['one','two']] = df['half_hourly_bucket'].str.split(' - ', expand=True)

#add first column to DatetimeIndex
df.index += pd.to_timedelta(df['one'])

#add mising values to DatetimeIndex
one_sec = pd.Timedelta(1, unit='s')
one_day = pd.Timedelta(1, unit='d')
df = df.reindex(pd.date_range(df.index.min().floor('D') + one_sec, 
                              df.index.max().floor('D') + one_day - one_sec, freq='30T'))

#recreate column two
df['two'] = df.index + pd.Timedelta(30*60 - 1, unit='s')
#join together
df['half_hourly_bucket'] = (df.index.strftime('%H:%M:%S') + ' - ' +
                            df['two'].dt.strftime('%H:%M:%S'))

#replace missing values
df['Value'] = df['Value'].fillna(0)

df = df.rename_axis('Date').reset_index()

#filter only necessary columns
df = df[['Date','half_hourly_bucket','Value']]

print (df)

                  Date   half_hourly_bucket  Value
0  2018-01-01 00:00:01  00:00:01 - 00:30:00  123.0
1  2018-01-01 00:30:01  00:30:01 - 01:00:00   12.0
2  2018-01-01 01:00:01  01:00:01 - 01:30:00  122.0
3  2018-01-01 01:30:01  01:30:01 - 02:00:00    0.0
4  2018-01-01 02:00:01  02:00:01 - 02:30:00  111.0
5  2018-01-01 02:30:01  02:30:01 - 03:00:00    0.0
6  2018-01-01 03:00:01  03:00:01 - 03:30:00  122.0
7  2018-01-01 03:30:01  03:30:01 - 04:00:00    0.0
8  2018-01-01 04:00:01  04:00:01 - 04:30:00  111.0
9  2018-01-01 04:30:01  04:30:01 - 05:00:00    0.0
10 2018-01-01 05:00:01  05:00:01 - 05:30:00    0.0
11 2018-01-01 05:30:01  05:30:01 - 06:00:00    0.0
12 2018-01-01 06:00:01  06:00:01 - 06:30:00    0.0
13 2018-01-01 06:30:01  06:30:01 - 07:00:00    0.0
14 2018-01-01 07:00:01  07:00:01 - 07:30:00    0.0
15 2018-01-01 07:30:01  07:30:01 - 08:00:00    0.0
16 2018-01-01 08:00:01  08:00:01 - 08:30:00    0.0
17 2018-01-01 08:30:01  08:30:01 - 09:00:00    0.0
18 2018-01-01 09:00:01  09:00:01 - 09:30:00    0.0
19 2018-01-01 09:30:01  09:30:01 - 10:00:00    0.0
20 2018-01-01 10:00:01  10:00:01 - 10:30:00    0.0
21 2018-01-01 10:30:01  10:30:01 - 11:00:00    0.0
22 2018-01-01 11:00:01  11:00:01 - 11:30:00    0.0
23 2018-01-01 11:30:01  11:30:01 - 12:00:00    0.0
24 2018-01-01 12:00:01  12:00:01 - 12:30:00    0.0
25 2018-01-01 12:30:01  12:30:01 - 13:00:00    0.0
26 2018-01-01 13:00:01  13:00:01 - 13:30:00    0.0
27 2018-01-01 13:30:01  13:30:01 - 14:00:00    0.0
28 2018-01-01 14:00:01  14:00:01 - 14:30:00    0.0
29 2018-01-01 14:30:01  14:30:01 - 15:00:00    0.0
30 2018-01-01 15:00:01  15:00:01 - 15:30:00    0.0
31 2018-01-01 15:30:01  15:30:01 - 16:00:00    0.0
32 2018-01-01 16:00:01  16:00:01 - 16:30:00    0.0
33 2018-01-01 16:30:01  16:30:01 - 17:00:00    0.0
34 2018-01-01 17:00:01  17:00:01 - 17:30:00    0.0
35 2018-01-01 17:30:01  17:30:01 - 18:00:00    0.0
36 2018-01-01 18:00:01  18:00:01 - 18:30:00    0.0
37 2018-01-01 18:30:01  18:30:01 - 19:00:00    0.0
38 2018-01-01 19:00:01  19:00:01 - 19:30:00    0.0
39 2018-01-01 19:30:01  19:30:01 - 20:00:00    0.0
40 2018-01-01 20:00:01  20:00:01 - 20:30:00    0.0
41 2018-01-01 20:30:01  20:30:01 - 21:00:00    0.0
42 2018-01-01 21:00:01  21:00:01 - 21:30:00    0.0
43 2018-01-01 21:30:01  21:30:01 - 22:00:00    0.0
44 2018-01-01 22:00:01  22:00:01 - 22:30:00    0.0
45 2018-01-01 22:30:01  22:30:01 - 23:00:00    0.0
46 2018-01-01 23:00:01  23:00:01 - 23:30:00    0.0
47 2018-01-01 23:30:01  23:30:01 - 00:00:00    0.0