如何在数据帧中解压缩日期时间值?

时间:2017-05-02 20:37:07

标签: python pandas dataframe

我查询MongoDB数据库并返回一个带有值字段的数据框,其中10分钟的时间戳是值的关键:

date        values
2017-04-30  {u'00:00:00': u'13.12', u'00:10:00': .....u'23:50:00': u'12.34'}
2017-05-01  {u'00:00:00': u'15.56', u'00:10:00': .....u'23:50:00': u'15.78'}
2017-05-02  {u'00:00:00': u'11.34', u'00:10:00': .....u'23:50:00': u'13.21'}
2017-05-03  {u'00:00:00': u'17.07', u'00:10:00': .....u'23:50:00': u'14.22'}
2017-05-04  {u'00:00:00': u'18.03', u'00:10:00': .....u'23:50:00': u'18.09'}

我想将其转换为更容易绘制的系列或元组对象:

[(2017-04-30 00:00:00, 13.12)....... (2017-05-04 23:50:00, 18.09)]

我的方法现在涉及一些复杂的功能和大量的数据争论,也许在Pandas中有一种更简单的方法?

2 个答案:

答案 0 :(得分:2)

选项1
相当紧凑的理解

from pandas import to_datetime as todt
from pandas import to_timedelta as totd

[
    [
        (str(todt(r.date) + totd(k)), v) for k, v in r.values.items()
    ] 
    for r in df.itertuples()
]

选项2
更多pandas类似解决方案

from pandas import to_datetime as todt
from pandas import to_timedelta as totd

d1 = df.set_index('date')['values'] \
       .apply(pd.Series).rename_axis('time', 1) \
       .stack().reset_index(name='val')

d1.assign(
    datetime=d1.date + totd(d1.time)
).groupby('date').apply(
    lambda x: list(zip(x.datetime, x.val))
)


date
2017-04-30    [(2017-04-30 00:00:00, 12.84), (2017-04-30 00:...
2017-05-01    [(2017-05-01 00:00:00, 14.32), (2017-05-01 00:...
2017-05-02    [(2017-05-02 00:00:00, 16.86), (2017-05-02 00:...
2017-05-03    [(2017-05-03 00:00:00, 10.82), (2017-05-03 00:...
2017-05-04    [(2017-05-04 00:00:00, 9.93), (2017-05-04 00:1...
dtype: object

设置

tidx = pd.date_range('1970-01-01', periods=6 * 24, freq='10T')
dates = pd.date_range('2017-04-30', periods=5)
df = pd.DataFrame(dict(
        date=dates,
        values=[
            {d.strftime('%H:%M:%S'): str((np.random.rand(1) * 20).round(2)[0]) for d in tidx}
            for _ in range(5)
        ]
    ))

答案 1 :(得分:0)

假设df是您的DataFrame

df = pd.DataFrame({'date':['2017-04-30', '2017-05-01', '2017-05-02', '2017-05-03', '2017-05-04']})
df['values']=None
df.set_value(0, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(1, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(2, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(3, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})
df.set_value(4, 'values', {u'00:00:00': u'13.12', u'23:50:00': u'12.34'})

然后输出

[('{date} {time}'.format(date=date, time=time), p) for date, v in zip(df['date'], df['values']) for time, p in v.items() ]

结果

[('2017-04-30 00:00:00', '13.12'),
 ('2017-04-30 23:50:00', '12.34'),
 ('2017-05-01 00:00:00', '13.12'),
 ('2017-05-01 23:50:00', '12.34'),
 ('2017-05-02 00:00:00', '13.12'),
 ('2017-05-02 23:50:00', '12.34'),
 ('2017-05-03 00:00:00', '13.12'),
 ('2017-05-03 23:50:00', '12.34'),
 ('2017-05-04 00:00:00', '13.12'),
 ('2017-05-04 23:50:00', '12.34')]