我有时间索引数据:
df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) })
df2 = df2.set_index('day')
df2
b
day
2012-01-01 0.22
2012-01-03 0.30
扩展此数据框的最佳方法是什么,以便它在2012年1月的每一天都有一行(比如说),其中所有列都设置为NaN
(此处仅b
)其中我们没有数据?
所以期望的结果是:
b
day
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
...
2012-01-31 NaN
非常感谢!
答案 0 :(得分:22)
使用此:
ix = pd.DatetimeIndex(start=date(2012, 1, 1), end=date(2012, 1, 31), freq='D')
df2.reindex(ix)
给出了:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
[...]
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
答案 1 :(得分:3)
您可以按频率重新标记过往日,而不指定fill_method
参数缺失值将根据需要填充NaN
df3 = df2.asfreq('D')
df3
Out[16]:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
要回答你的第二部分,我现在想不出更优雅的方式:
df3 = DataFrame({ 'day': Series([date(2012, 1, 4), date(2012, 1, 31)])})
df3.set_index('day',inplace=True)
merged = df2.append(df3)
merged = merged.asfreq('D')
merged
Out[46]:
b
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
2012-01-06 NaN
2012-01-07 NaN
2012-01-08 NaN
2012-01-09 NaN
2012-01-10 NaN
2012-01-11 NaN
2012-01-12 NaN
2012-01-13 NaN
2012-01-14 NaN
2012-01-15 NaN
2012-01-16 NaN
2012-01-17 NaN
2012-01-18 NaN
2012-01-19 NaN
2012-01-20 NaN
2012-01-21 NaN
2012-01-22 NaN
2012-01-23 NaN
2012-01-24 NaN
2012-01-25 NaN
2012-01-26 NaN
2012-01-27 NaN
2012-01-28 NaN
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
这构建了第二个时间序列,然后我们只是追加并像以前一样调用asfreq('D')
。
答案 2 :(得分:2)
这是另一种选择:
首先在您想要的最后一天添加NaN
记录,然后重新取样。这样重新采样将为您填写缺失的日期。
起始帧:
import pandas as pd
import numpy as np
from datetime import date
df2 = pd.DataFrame({ 'day': pd.Series([date(2012, 1, 1), date(2012, 1, 3)]), 'b' : pd.Series([0.22, 0.3]) })
df2= df2.set_index('day')
df2
Out:
b
day
2012-01-01 0.22
2012-01-03 0.30
填充框架:
df2 = df2.set_value(date(2012,1,31),'b',np.float('nan'))
df2.asfreq('D')
Out:
b
day
2012-01-01 0.22
2012-01-02 NaN
2012-01-03 0.30
2012-01-04 NaN
2012-01-05 NaN
2012-01-06 NaN
2012-01-07 NaN
2012-01-08 NaN
2012-01-09 NaN
2012-01-10 NaN
2012-01-11 NaN
2012-01-12 NaN
2012-01-13 NaN
2012-01-14 NaN
2012-01-15 NaN
2012-01-16 NaN
2012-01-17 NaN
2012-01-18 NaN
2012-01-19 NaN
2012-01-20 NaN
2012-01-21 NaN
2012-01-22 NaN
2012-01-23 NaN
2012-01-24 NaN
2012-01-25 NaN
2012-01-26 NaN
2012-01-27 NaN
2012-01-28 NaN
2012-01-29 NaN
2012-01-30 NaN
2012-01-31 NaN
答案 3 :(得分:1)
Mark的答案似乎不再适用于熊猫1.1.1。
但是,使用相同的思想,可以实现以下目的:
from datetime import datetime
import pandas as pd
# get start and desired end dates
first_date = df['date'].min()
today = datetime.today()
# set index
df.set_index('date', inplace=True)
# and here is were the magic happens
idx = pd.date_range(first_date, today, freq='D')
df = df.reindex(idx)
编辑:刚刚发现这个确切的用例在文档中:
答案 4 :(得分:0)
并不是完全正确的问题,因为您在这里知道第二个索引是一月份的全天,但是假设您有另一个数据框df1的另一个索引,它可能是不相交的,并且频率随机。然后,您可以执行以下操作:
ix = pd.DatetimeIndex(list(df2.index) + list(df1.index)).unique().sort_values()
df2.reindex(ix)
将索引转换为列表可以自然地创建一个更长的列表。
答案 5 :(得分:0)
def extendframe(df, ndays):
"""
(df, ndays) -> df that is padded by ndays in beginning and end
"""
ixd = df.index - datetime.timedelta(ndays)
ixu = df.index + datetime.timedelta(ndays)
ixx = df.index.union(ixd.union(ixu))
df_ = df.reindex(ixx)
return df_