从2列创建分类数据 - Python Pandas

时间:2018-03-20 13:35:16

标签: python pandas boolean intervals

我在创建一个数据帧时遇到问题,该数据帧保持温度测量的时间间隔。至于现在,数据帧的索引为时间,另一列为测量值,我希望将时间转换为间隔为12小时,测量值为该游戏中时光倒流中的值的平均值。

                         measurement
time
2016-11-04 08:49:25    17.730000
2016-11-04 10:23:52    18.059999
2016-11-04 11:02:09    18.370001
2016-11-04 12:04:20    18.090000
2016-11-04 14:26:43    18.320000

因此,不是每次都与测量相关,而是希望let的值的平均值为12小时,如下所示:

                                              measurement
time
2016-11-04 00:00:00 - 2016-11-04 12:00:00     17.730000
2016-11-04 12:00:00 - 2016-11-05 00:00:00     18.059999
2016-11-05 00:00:00 - 2016-11-05 12:00:00     18.370001
2016-11-05 12:00:00 - 2016-11-06 00:00:00     18.090000
2016-11-06 00:00:00 - 2016-11-06 12:00:00     18.320000

有一种简单的方法可以用熊猫做到这一点吗?

稍后我想将测量值转换为间隔,以便数据变为布尔值,如下所示:

                                              17.0-18.0   18.0-19.0  19.0-20
time
2016-11-04 00:00:00 - 2016-11-04 12:00:00         1           0         0
2016-11-04 12:00:00 - 2016-11-05 00:00:00         0           1         0
2016-11-05 00:00:00 - 2016-11-05 12:00:00         0           1         0
2016-11-05 12:00:00 - 2016-11-06 00:00:00         0           1         0
2016-11-06 00:00:00 - 2016-11-06 12:00:00         0           1         0

修改 我使用了Coldspeed首次发布的解决方案

df = pd.DataFrame({'timestamp':time.values, 'readings':readings.values})
df = df.groupby(pd.Grouper(key='timestamp', freq='12H'))['readings'].mean()
v = pd.cut(df, bins=[17,18,19,20,21,22,23,24,25,26,27,28], labels=['17-18','18-19','19-20','20-21','21-22','22-23','23-24','24-25','25-26','26-27','27-28'])

我知道这些垃圾箱和标签可能已经完成但只是一个for循环,但这只是一个快速修复。 groupby函数对" timestamp'的值进行分组。在12小时的频率,并获得游戏中时光倒流的读数平均值。

然后使用cut函数将平均值分类到它们的类别中。

结果:

                     17-18  18-19  19-20  20-21  21-22  22-23  23-24  24-25  \
timestamp
2016-11-04 00:00:00      0      1      0      0      0      0      0      0
2016-11-04 12:00:00      0      1      0      0      0      0      0      0
2016-11-05 00:00:00      0      0      0      0      0      0      0      0
2016-11-05 12:00:00      1      0      0      0      0      0      0      0
2016-11-06 00:00:00      1      0      0      0      0      0      0      0
2016-11-06 12:00:00      0      0      0      0      0      0      0      0
2016-11-07 00:00:00      0      1      0      0      0      0      0      0
2016-11-07 12:00:00      1      0      0      0      0      0      0      0
2016-11-08 00:00:00      0      0      0      0      0      0      0      0
2016-11-08 12:00:00      0      0      0      0      0      0      0      0
2016-11-09 00:00:00      1      0      0      0      0      0      0      0
2016-11-09 12:00:00      1      0      0      0      0      0      0      0
2016-11-10 00:00:00      0      1      0      0      0      0      0      0
2016-11-10 12:00:00      0      0      0      0      0      0      0      0
2016-11-11 00:00:00      0      0      0      0      0      0      0      0
2016-11-11 12:00:00      0      0      0      0      0      0      0      0
2016-11-12 00:00:00      0      0      0      0      0      0      0      0
2016-11-12 12:00:00      0      0      0      0      0      0      0      0
2016-11-13 00:00:00      0      0      0      0      0      0      0      0
2016-11-13 12:00:00      0      0      0      0      0      0      0      0
2016-11-14 00:00:00      0      0      0      0      0      0      0      0
2016-11-14 12:00:00      0      1      0      0      0      0      0      0
2016-11-15 00:00:00      0      0      0      1      0      0      0      0
2016-11-15 12:00:00      0      0      0      0      0      1      0      0
2016-11-16 00:00:00      0      0      0      0      0      0      1      0
2016-11-16 12:00:00      0      0      0      0      0      0      0      0
2016-11-17 00:00:00      0      0      0      0      0      0      0      0

4 个答案:

答案 0 :(得分:1)

使用import asyncio import aiohttp import async_timeout import json async def fetch(session, url): async with async_timeout.timeout(10): async with session.get(url) as response: return await response.text() async def get_bittrex_marketsummary(currency_pair): url = f'https://bittrex.com/api/v1.1/public/getmarketsummary?market={currency_pair}' async with aiohttp.ClientSession() as session: response = await fetch(session, url) return json.loads(response) class MyCryptoCurrency: def __init__(self): self.currency = "BTC-ETH" self.last_price = None async def get_last_price(self): self.last_price = await get_bittrex_marketsummary(self.currency) async def main(): eth = MyCryptoCurrency() await eth.get_last_price() print(eth.last_price) loop = asyncio.get_event_loop() loop.run_until_complete(main()) + pd.cut

pd.get_dummies

答案 1 :(得分:1)

IIUC你要用12小时的块重新取样,然后制作假人 pd.cut是一种完全可以接受的方法,可以将结果数据切割成垃圾箱 但是,我使用np.searchsorted来完成任务。

bins = np.array([17, 18, 19, 20])
labels = np.array(['<17', '17-18', '18-19', '19-20', '>20'])
resampled = df.resample('12H').measurement.mean()
pd.get_dummies(pd.Series(labels[bins.searchsorted(resampled.values)], resampled.index))

                     17-18  18-19  19-20  >20
2018-03-20 00:00:00      0      1      0    0
2018-03-20 12:00:00      1      0      0    0
2018-03-21 00:00:00      0      1      0    0
2018-03-21 12:00:00      0      0      0    1
2018-03-22 00:00:00      0      0      1    0
2018-03-22 12:00:00      0      0      0    1

设置

np.random.seed(int(np.pi * 1E6))

tidx = pd.date_range(pd.Timestamp('now'), freq='3H', periods=20)
df = pd.DataFrame(dict(measurement=np.random.rand(len(tidx)) * 6 + 17), tidx)

df

                            measurement
2018-03-20 06:58:30.484383    17.960744
2018-03-20 09:58:30.484383    18.572100
2018-03-20 12:58:30.484383    17.646766
2018-03-20 15:58:30.484383    19.025463
2018-03-20 18:58:30.484383    17.521399
2018-03-20 21:58:30.484383    17.318663
2018-03-21 00:58:30.484383    19.388553
2018-03-21 03:58:30.484383    19.520969
2018-03-21 06:58:30.484383    19.060640
2018-03-21 09:58:30.484383    17.106034
2018-03-21 12:58:30.484383    22.887546
2018-03-21 15:58:30.484383    18.437271
2018-03-21 18:58:30.484383    18.426362
2018-03-21 21:58:30.484383    20.558928
2018-03-22 00:58:30.484383    22.555121
2018-03-22 03:58:30.484383    17.139489
2018-03-22 06:58:30.484383    17.209499
2018-03-22 09:58:30.484383    19.466367
2018-03-22 12:58:30.484383    21.765692
2018-03-22 15:58:30.484383    19.680785

答案 2 :(得分:0)

您可以使用pd.cut() + pd.get_dummies()

df["measurement"] = pd.cut(df["measurement"], bins=[17.0,18.0,19.0,20.0])
dummies = pd.get_dummies(df["measurement"])

答案 3 :(得分:0)

对于您的第一个问题:您可以使用pandas.TimeGrouper每12小时(或任何其他频率)进行分组,然后取组的平均值。

df.groupby([pd.TimeGrouper(freq='12H')]).mean()