我有一个带有日期时间索引的数据框,似乎我无法添加恰好是年份的列。
import pandas as pd
from pandas import DataFrame, Series
df = DataFrame({'2013' : [1, 2, 3, 4]}, index=pd.date_range('2014-02-21', periods=4, freq='H'))
现在,df存储以下内容:
2013
2014-02-21 00:00:00 1
2014-02-21 01:00:00 2
2014-02-21 02:00:00 3
2014-02-21 03:00:00 4
[4 rows x 1 columns]
添加列'2015'按预期工作:
df['2015'] = -1 # or df.loc[:, '2015'] = -1
现在,df商店:
2013 2015
2014-02-21 00:00:00 1 -1
2014-02-21 01:00:00 2 -1
2014-02-21 02:00:00 3 -1
2014-02-21 03:00:00 4 -1
[4 rows x 2 columns]
但是,以相同方式添加“2014”将无效,因为:
df['2014'] # Returns the entire df, because df is sliced on year?
和
df.loc[:, '2014'] = -1 # Throws a KeyError.
我认为我不想使用join或merge,因为这些返回副本。我担心将许多(即> 1e + 5)列添加到df,每次(重新)分配给df,会消耗太多内存。我是对的吗?
答案 0 :(得分:1)
解决此问题的方法可能是首先将列名添加到数据框:
>>> df = df.reindex_axis(df.columns.tolist() + ['2014', '2015'],
axis=1, copy=False)
>>> df
2013 2014 2015
2014-02-21 00:00:00 1 NaN NaN
2014-02-21 01:00:00 2 NaN NaN
2014-02-21 02:00:00 3 NaN NaN
2014-02-21 03:00:00 4 NaN NaN
>>> df['2015'] = -1
>>> df['2014'] = 0
>>> df
2013 2014 2015
2014-02-21 00:00:00 1 0 -1
2014-02-21 01:00:00 2 0 -1
2014-02-21 02:00:00 3 0 -1
2014-02-21 03:00:00 4 0 -1
[4 rows x 3 columns]