我查询MongoDB数据库并返回一个带有值字段的数据框,其中10分钟的时间戳是值的关键:
date values
2017-04-30 {u'00:00:00': u'13.12', u'00:10:00': .....u'23:50:00': u'12.34'}
2017-05-01 {u'00:00:00': u'15.56', u'00:10:00': .....u'23:50:00': u'15.78'}
2017-05-02 {u'00:00:00': u'11.34', u'00:10:00': .....u'23:50:00': u'13.21'}
2017-05-03 {u'00:00:00': u'17.07', u'00:10:00': .....u'23:50:00': u'14.22'}
2017-05-04 {u'00:00:00': u'18.03', u'00:10:00': .....u'23:50:00': u'18.09'}
我想将其转换为更容易绘制的系列或元组对象:
[(2017-04-30 00:00:00, 13.12)....... (2017-05-04 23:50:00, 18.09)]
我的方法现在涉及一些复杂的功能和大量的数据争论,也许在Pandas中有一种更简单的方法?
答案 0 :(得分:2)
选项1
相当紧凑的理解
from pandas import to_datetime as todt
from pandas import to_timedelta as totd
[
[
(str(todt(r.date) + totd(k)), v) for k, v in r.values.items()
]
for r in df.itertuples()
]
选项2
更多pandas
类似解决方案
from pandas import to_datetime as todt
from pandas import to_timedelta as totd
d1 = df.set_index('date')['values'] \
.apply(pd.Series).rename_axis('time', 1) \
.stack().reset_index(name='val')
d1.assign(
datetime=d1.date + totd(d1.time)
).groupby('date').apply(
lambda x: list(zip(x.datetime, x.val))
)
date
2017-04-30 [(2017-04-30 00:00:00, 12.84), (2017-04-30 00:...
2017-05-01 [(2017-05-01 00:00:00, 14.32), (2017-05-01 00:...
2017-05-02 [(2017-05-02 00:00:00, 16.86), (2017-05-02 00:...
2017-05-03 [(2017-05-03 00:00:00, 10.82), (2017-05-03 00:...
2017-05-04 [(2017-05-04 00:00:00, 9.93), (2017-05-04 00:1...
dtype: object
设置
tidx = pd.date_range('1970-01-01', periods=6 * 24, freq='10T')
dates = pd.date_range('2017-04-30', periods=5)
df = pd.DataFrame(dict(
date=dates,
values=[
{d.strftime('%H:%M:%S'): str((np.random.rand(1) * 20).round(2)[0]) for d in tidx}
for _ in range(5)
]
))
答案 1 :(得分:0)
假设df
是您的DataFrame
df = pd.DataFrame({'date':['2017-04-30', '2017-05-01', '2017-05-02', '2017-05-03', '2017-05-04']})
df['values']=None
df.set_value(0, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(1, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(2, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(3, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(4, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
然后输出
[('{date} {time}'.format(date=date, time=time), p)
for date, v in zip(df['date'], df['values'])
for time, p in v.items()
]
结果
[('2017-04-30 00:00:00', '13.12'),
('2017-04-30 23:50:00', '12.34'),
('2017-05-01 00:00:00', '13.12'),
('2017-05-01 23:50:00', '12.34'),
('2017-05-02 00:00:00', '13.12'),
('2017-05-02 23:50:00', '12.34'),
('2017-05-03 00:00:00', '13.12'),
('2017-05-03 23:50:00', '12.34'),
('2017-05-04 00:00:00', '13.12'),
('2017-05-04 23:50:00', '12.34')]