Pandas(python)对列的索引与行索引冲突

时间:2014-02-24 11:28:25

标签: python pandas

我有一个带有日期时间索引的数据框,似乎我无法添加恰好是年份的列。

import pandas as pd
from pandas import DataFrame, Series

df = DataFrame({'2013' : [1, 2, 3, 4]}, index=pd.date_range('2014-02-21', periods=4, freq='H'))

现在,df存储以下内容:

                     2013
2014-02-21 00:00:00     1
2014-02-21 01:00:00     2
2014-02-21 02:00:00     3
2014-02-21 03:00:00     4

[4 rows x 1 columns]

添加列'2015'按预期工作:

df['2015'] = -1 # or df.loc[:, '2015'] = -1

现在,df商店:

                     2013  2015
2014-02-21 00:00:00     1    -1
2014-02-21 01:00:00     2    -1
2014-02-21 02:00:00     3    -1
2014-02-21 03:00:00     4    -1

[4 rows x 2 columns]

但是,以相同方式添加“2014”将无效,因为:

df['2014'] # Returns the entire df, because df is sliced on year?

df.loc[:, '2014'] = -1 # Throws a KeyError.

我认为我不想使用join或merge,因为这些返回副本。我担心将许多(即> 1e + 5)列添加到df,每次(重新)分配给df,会消耗太多内存。我是对的吗?

1 个答案:

答案 0 :(得分:1)

解决此问题的方法可能是首先将列名添加到数据框:

>>> df = df.reindex_axis(df.columns.tolist() + ['2014', '2015'],
                         axis=1, copy=False)
>>> df
                     2013  2014  2015
2014-02-21 00:00:00     1   NaN   NaN
2014-02-21 01:00:00     2   NaN   NaN
2014-02-21 02:00:00     3   NaN   NaN
2014-02-21 03:00:00     4   NaN   NaN

>>> df['2015'] = -1
>>> df['2014'] = 0
>>> df
                     2013  2014  2015
2014-02-21 00:00:00     1     0    -1
2014-02-21 01:00:00     2     0    -1
2014-02-21 02:00:00     3     0    -1
2014-02-21 03:00:00     4     0    -1

[4 rows x 3 columns]