我从互联网上提取了一些数据,基本上是一整年的2列小时数据:
france.GetData(base_scenario, utils.enumerate_periods(start,end,'H','CET'))
输出
2015-12-31 23:00:00+00:00 23.86
2016-01-01 00:00:00+00:00 22.39
2016-01-01 01:00:00+00:00 20.59
2016-01-01 02:00:00+00:00 16.81
2016-01-01 03:00:00+00:00 17.41
2016-01-01 04:00:00+00:00 17.02
2016-01-01 05:00:00+00:00 15.86...
我想基本上添加两个列'peak'小时和'off off'小时缩放列。因此,如果当天的时间在0800和1800之间,则峰值列中将有1,如果超出这些时间,则在非峰值列中将有1。
有人可以解释一下如何做到这一点。
非常感谢
答案 0 :(得分:2)
我认为如果不是DatetimeIndex
,您可以使用to_datetime
,然后使用between_time
到列peak
并测试notnull
- 如果NaN
获取False
,如果某个值获得True
。然后布尔值转换为int
(False
- > 0
和True
- > 1
)astype
,最后一列peak
获取peak-off
(感谢Quickbeam2k1):
df = pd.DataFrame({'col': {'2016-01-01 01:00:00+00:00': 20.59, '2016-01-01 07:00:00+00:00': 15.86, '2016-01-01 10:00:00+00:00': 15.86, '2016-01-01 09:00:00+00:00': 15.86, '2016-01-01 02:00:00+00:00': 16.81, '2016-01-01 03:00:00+00:00': 17.41, '2016-01-01 05:00:00+00:00': 15.86, '2016-01-01 04:00:00+00:00': 17.02, '2016-01-01 08:00:00+00:00': 15.86, '2015-12-31 23:00:00+00:00': 23.86, '2016-01-01 18:00:00+00:00': 15.86, '2016-01-01 06:00:00+00:00': 15.86, '2016-01-01 00:00:00+00:00': 22.39}})
print (df)
col
2015-12-31 23:00:00+00:00 23.86
2016-01-01 00:00:00+00:00 22.39
2016-01-01 01:00:00+00:00 20.59
2016-01-01 02:00:00+00:00 16.81
2016-01-01 03:00:00+00:00 17.41
2016-01-01 04:00:00+00:00 17.02
2016-01-01 05:00:00+00:00 15.86
2016-01-01 06:00:00+00:00 15.86
2016-01-01 07:00:00+00:00 15.86
2016-01-01 08:00:00+00:00 15.86
2016-01-01 09:00:00+00:00 15.86
2016-01-01 10:00:00+00:00 15.86
2016-01-01 18:00:00+00:00 15.86
print (df.index)
Index(['2015-12-31 23:00:00+00:00', '2016-01-01 00:00:00+00:00',
'2016-01-01 01:00:00+00:00', '2016-01-01 02:00:00+00:00',
'2016-01-01 03:00:00+00:00', '2016-01-01 04:00:00+00:00',
'2016-01-01 05:00:00+00:00', '2016-01-01 06:00:00+00:00',
'2016-01-01 07:00:00+00:00', '2016-01-01 08:00:00+00:00',
'2016-01-01 09:00:00+00:00', '2016-01-01 10:00:00+00:00',
'2016-01-01 18:00:00+00:00'],
dtype='object')
df.index = pd.to_datetime(df.index)
print (df.index)
DatetimeIndex(['2015-12-31 23:00:00', '2016-01-01 00:00:00',
'2016-01-01 01:00:00', '2016-01-01 02:00:00',
'2016-01-01 03:00:00', '2016-01-01 04:00:00',
'2016-01-01 05:00:00', '2016-01-01 06:00:00',
'2016-01-01 07:00:00', '2016-01-01 08:00:00',
'2016-01-01 09:00:00', '2016-01-01 10:00:00',
'2016-01-01 18:00:00'],
dtype='datetime64[ns]', freq=None)
df['peak'] = df.between_time('08:00', '18:00')
df['peak'] = df['peak'].notnull().astype(int)
df['peak-off'] = -df['peak'] + 1
print (df)
col peak peak-off
2015-12-31 23:00:00 23.86 0 1
2016-01-01 00:00:00 22.39 0 1
2016-01-01 01:00:00 20.59 0 1
2016-01-01 02:00:00 16.81 0 1
2016-01-01 03:00:00 17.41 0 1
2016-01-01 04:00:00 17.02 0 1
2016-01-01 05:00:00 15.86 0 1
2016-01-01 06:00:00 15.86 0 1
2016-01-01 07:00:00 15.86 0 1
2016-01-01 08:00:00 15.86 1 0
2016-01-01 09:00:00 15.86 1 0
2016-01-01 10:00:00 15.86 1 0
2016-01-01 18:00:00 15.86 1 0
另一个解决方案是首先按条件获取boolean
掩码,然后将其转换为int
,以反转掩码使用~
:
h1 = pd.datetime.strptime('08:00:00', '%H:%M:%S').time()
h2 = pd.datetime.strptime('18:00:00', '%H:%M:%S').time()
times = df.index.time
mask = (times >= h1) & (times <= h2)
df['peak'] = mask.astype(int)
df['peak-off'] = (~mask).astype(int)
print (df)
col peak peak-off
2015-12-31 23:00:00 23.86 0 1
2016-01-01 00:00:00 22.39 0 1
2016-01-01 01:00:00 20.59 0 1
2016-01-01 02:00:00 16.81 0 1
2016-01-01 03:00:00 17.41 0 1
2016-01-01 04:00:00 17.02 0 1
2016-01-01 05:00:00 15.86 0 1
2016-01-01 06:00:00 15.86 0 1
2016-01-01 07:00:00 15.86 0 1
2016-01-01 08:00:00 15.86 1 0
2016-01-01 09:00:00 15.86 1 0
2016-01-01 10:00:00 15.86 1 0
2016-01-01 18:00:00 15.86 1 0
如果只有小时数据解决方案可以更简单 - 使用DatetimeIndex.hour
作为掩码:
df.index = pd.to_datetime(df.index)
print (df.index)
h = df.index.hour
mask = (h >= 8) & (h <= 18)
df['peak'] = mask.astype(int)
df['peak-off'] = (~mask).astype(int)
print (df)
col peak peak-off
2015-12-31 23:00:00 23.86 0 1
2016-01-01 00:00:00 22.39 0 1
2016-01-01 01:00:00 20.59 0 1
2016-01-01 02:00:00 16.81 0 1
2016-01-01 03:00:00 17.41 0 1
2016-01-01 04:00:00 17.02 0 1
2016-01-01 05:00:00 15.86 0 1
2016-01-01 06:00:00 15.86 0 1
2016-01-01 07:00:00 15.86 0 1
2016-01-01 08:00:00 15.86 1 0
2016-01-01 09:00:00 15.86 1 0
2016-01-01 10:00:00 15.86 1 0
2016-01-01 18:00:00 15.86 1 0