Question

在python pandas DataFrame中，我想在单行中更新索引的值（最好是就地，因为DataFrame非常大）。

索引是DatetimeIndex，DataFrame可能包含多个列。

例如：

In [1]: import pandas as pd
In [2]: pd.DataFrame({'DATA': [1,2,3]},
                      index=[pd.Timestamp(2011,10,01,00,00,00),
                             pd.Timestamp(2011,10,01,02,00,00),
                             pd.Timestamp(2011,10,01,03,00,00)])
Out[5]: 
                     DATA
2011-10-01 00:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

所需的输出是：

                     DATA
2011-10-01 01:00:00     1   <---- Index changed !!!
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

是否有一种简单（且便宜）的方法可以为大型DataFrame执行此操作？

假设样本的位置已知（例如，它是需要更改的第n行）！

Answer 1

如果你已经知道要操作的索引，那么快速方法就是直接查找，然后你可以在Index.set_value的帮助下相应地设置它的值：

df.index.set_value(df.index, df.index[0], pd.Timestamp(2011,10,1,1,0,0))
#                  <-index-> <-row num->  <---value to be inserted--->

这是一项就地操作，因此您无需将结果分配回自身。

Answer 2

Series.replace的一种可能解决方案，但首先需要转换Index.to_series：

df.index = df.index
             .to_series()
             .replace({pd.Timestamp('2011-10-01'): pd.Timestamp('2011-10-01 01:00:00')})
print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

Index.where的另一个解决方案（0.19.0中的新内容）：

df.index = df.index.where(df.index != pd.Timestamp('2011-10-01'),
                          [pd.Timestamp('2011-10-01 01:00:00')])

print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

添加新行并在drop之前删除旧行的解决方案，最后sort_index：

df.loc[pd.Timestamp('2011-10-01 01:00:00')] = df.loc['2011-10-01 00:00:00', 'DATA']
df.drop(pd.Timestamp('2011-10-01 00:00:00'), inplace=True)
df.sort_index(inplace=True)
print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

另一个solution如果需要替换值而不是位置：

df.index.set_value(df.index, pd.Timestamp(2011,10,1,0,0,0), pd.Timestamp(2011,10,1,1,0,0))
print (df)
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

从comment转换index到numpy array的最后解决方案：

i = 0
df.index.values[i] = pd.Timestamp('2011-10-01 01:00:00')
print (df)          
                     DATA
2011-10-01 01:00:00     1
2011-10-01 02:00:00     2
2011-10-01 03:00:00     3

如何更新pandas DataFrame中单行的DatetimeIndex值？

2 个答案: