我在创建一个数据帧时遇到问题,该数据帧保持温度测量的时间间隔。至于现在,数据帧的索引为时间,另一列为测量值,我希望将时间转换为间隔为12小时,测量值为该游戏中时光倒流中的值的平均值。
measurement
time
2016-11-04 08:49:25 17.730000
2016-11-04 10:23:52 18.059999
2016-11-04 11:02:09 18.370001
2016-11-04 12:04:20 18.090000
2016-11-04 14:26:43 18.320000
因此,不是每次都与测量相关,而是希望let的值的平均值为12小时,如下所示:
measurement
time
2016-11-04 00:00:00 - 2016-11-04 12:00:00 17.730000
2016-11-04 12:00:00 - 2016-11-05 00:00:00 18.059999
2016-11-05 00:00:00 - 2016-11-05 12:00:00 18.370001
2016-11-05 12:00:00 - 2016-11-06 00:00:00 18.090000
2016-11-06 00:00:00 - 2016-11-06 12:00:00 18.320000
有一种简单的方法可以用熊猫做到这一点吗?
稍后我想将测量值转换为间隔,以便数据变为布尔值,如下所示:
17.0-18.0 18.0-19.0 19.0-20
time
2016-11-04 00:00:00 - 2016-11-04 12:00:00 1 0 0
2016-11-04 12:00:00 - 2016-11-05 00:00:00 0 1 0
2016-11-05 00:00:00 - 2016-11-05 12:00:00 0 1 0
2016-11-05 12:00:00 - 2016-11-06 00:00:00 0 1 0
2016-11-06 00:00:00 - 2016-11-06 12:00:00 0 1 0
修改 我使用了Coldspeed首次发布的解决方案
df = pd.DataFrame({'timestamp':time.values, 'readings':readings.values})
df = df.groupby(pd.Grouper(key='timestamp', freq='12H'))['readings'].mean()
v = pd.cut(df, bins=[17,18,19,20,21,22,23,24,25,26,27,28], labels=['17-18','18-19','19-20','20-21','21-22','22-23','23-24','24-25','25-26','26-27','27-28'])
我知道这些垃圾箱和标签可能已经完成但只是一个for循环,但这只是一个快速修复。 groupby函数对" timestamp'的值进行分组。在12小时的频率,并获得游戏中时光倒流的读数平均值。
然后使用cut函数将平均值分类到它们的类别中。
结果:
17-18 18-19 19-20 20-21 21-22 22-23 23-24 24-25 \
timestamp
2016-11-04 00:00:00 0 1 0 0 0 0 0 0
2016-11-04 12:00:00 0 1 0 0 0 0 0 0
2016-11-05 00:00:00 0 0 0 0 0 0 0 0
2016-11-05 12:00:00 1 0 0 0 0 0 0 0
2016-11-06 00:00:00 1 0 0 0 0 0 0 0
2016-11-06 12:00:00 0 0 0 0 0 0 0 0
2016-11-07 00:00:00 0 1 0 0 0 0 0 0
2016-11-07 12:00:00 1 0 0 0 0 0 0 0
2016-11-08 00:00:00 0 0 0 0 0 0 0 0
2016-11-08 12:00:00 0 0 0 0 0 0 0 0
2016-11-09 00:00:00 1 0 0 0 0 0 0 0
2016-11-09 12:00:00 1 0 0 0 0 0 0 0
2016-11-10 00:00:00 0 1 0 0 0 0 0 0
2016-11-10 12:00:00 0 0 0 0 0 0 0 0
2016-11-11 00:00:00 0 0 0 0 0 0 0 0
2016-11-11 12:00:00 0 0 0 0 0 0 0 0
2016-11-12 00:00:00 0 0 0 0 0 0 0 0
2016-11-12 12:00:00 0 0 0 0 0 0 0 0
2016-11-13 00:00:00 0 0 0 0 0 0 0 0
2016-11-13 12:00:00 0 0 0 0 0 0 0 0
2016-11-14 00:00:00 0 0 0 0 0 0 0 0
2016-11-14 12:00:00 0 1 0 0 0 0 0 0
2016-11-15 00:00:00 0 0 0 1 0 0 0 0
2016-11-15 12:00:00 0 0 0 0 0 1 0 0
2016-11-16 00:00:00 0 0 0 0 0 0 1 0
2016-11-16 12:00:00 0 0 0 0 0 0 0 0
2016-11-17 00:00:00 0 0 0 0 0 0 0 0
答案 0 :(得分:1)
使用import asyncio
import aiohttp
import async_timeout
import json
async def fetch(session, url):
async with async_timeout.timeout(10):
async with session.get(url) as response:
return await response.text()
async def get_bittrex_marketsummary(currency_pair):
url = f'https://bittrex.com/api/v1.1/public/getmarketsummary?market={currency_pair}'
async with aiohttp.ClientSession() as session:
response = await fetch(session, url)
return json.loads(response)
class MyCryptoCurrency:
def __init__(self):
self.currency = "BTC-ETH"
self.last_price = None
async def get_last_price(self):
self.last_price = await get_bittrex_marketsummary(self.currency)
async def main():
eth = MyCryptoCurrency()
await eth.get_last_price()
print(eth.last_price)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
+ pd.cut
:
pd.get_dummies
答案 1 :(得分:1)
IIUC你要用12小时的块重新取样,然后制作假人
pd.cut
是一种完全可以接受的方法,可以将结果数据切割成垃圾箱
但是,我使用np.searchsorted
来完成任务。
bins = np.array([17, 18, 19, 20])
labels = np.array(['<17', '17-18', '18-19', '19-20', '>20'])
resampled = df.resample('12H').measurement.mean()
pd.get_dummies(pd.Series(labels[bins.searchsorted(resampled.values)], resampled.index))
17-18 18-19 19-20 >20
2018-03-20 00:00:00 0 1 0 0
2018-03-20 12:00:00 1 0 0 0
2018-03-21 00:00:00 0 1 0 0
2018-03-21 12:00:00 0 0 0 1
2018-03-22 00:00:00 0 0 1 0
2018-03-22 12:00:00 0 0 0 1
设置
np.random.seed(int(np.pi * 1E6))
tidx = pd.date_range(pd.Timestamp('now'), freq='3H', periods=20)
df = pd.DataFrame(dict(measurement=np.random.rand(len(tidx)) * 6 + 17), tidx)
df
measurement
2018-03-20 06:58:30.484383 17.960744
2018-03-20 09:58:30.484383 18.572100
2018-03-20 12:58:30.484383 17.646766
2018-03-20 15:58:30.484383 19.025463
2018-03-20 18:58:30.484383 17.521399
2018-03-20 21:58:30.484383 17.318663
2018-03-21 00:58:30.484383 19.388553
2018-03-21 03:58:30.484383 19.520969
2018-03-21 06:58:30.484383 19.060640
2018-03-21 09:58:30.484383 17.106034
2018-03-21 12:58:30.484383 22.887546
2018-03-21 15:58:30.484383 18.437271
2018-03-21 18:58:30.484383 18.426362
2018-03-21 21:58:30.484383 20.558928
2018-03-22 00:58:30.484383 22.555121
2018-03-22 03:58:30.484383 17.139489
2018-03-22 06:58:30.484383 17.209499
2018-03-22 09:58:30.484383 19.466367
2018-03-22 12:58:30.484383 21.765692
2018-03-22 15:58:30.484383 19.680785
答案 2 :(得分:0)
您可以使用pd.cut()
+ pd.get_dummies()
:
df["measurement"] = pd.cut(df["measurement"], bins=[17.0,18.0,19.0,20.0])
dummies = pd.get_dummies(df["measurement"])
答案 3 :(得分:0)
对于您的第一个问题:您可以使用pandas.TimeGrouper
每12小时(或任何其他频率)进行分组,然后取组的平均值。
df.groupby([pd.TimeGrouper(freq='12H')]).mean()