以下代码说明了我的问题:
In [2]: idx = pd.date_range('1/1/2011', periods=5)
In [3]: idx
Out[3]:
DatetimeIndex(['2011-01-01', '2011-01-02', '2011-01-03', '2011-01-04', '2011-01-05'],
dtype='datetime64[ns]', freq='D')
In [4]: midx = pd.MultiIndex.from_product([['100', '200'], idx])
In [5]: midx
Out[5]: MultiIndex(levels=[['100', '200'],
[2011-01-01 00:00:00, 2011-01-02 00:00:00, 2011-01-03 00:00:00, 2011-01-04 00:00:00, 2011-01-05 00:00:00]],
labels=[[0, 0, 0, 0, 0, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 0, 1, 2, 3, 4]])
In [6]: test_data = pd.DataFrame(
2*[[1, 2], [NaN, 3], [4, NaN], [5, 6], [7, 8]],
index=midx, columns=['quant1', 'quant2']
)
In [7]: test_data
Out[7]:
quant1 quant2
100 2011-01-01 1.0 2.0
2011-01-02 NaN 3.0
2011-01-03 4.0 NaN
2011-01-04 5.0 6.0
2011-01-05 7.0 8.0
200 2011-01-01 1.0 2.0
2011-01-02 NaN 3.0
2011-01-03 4.0 NaN
2011-01-04 5.0 6.0
2011-01-05 7.0 8.0
In [8]: new_data = pd.DataFrame([11, 12, 13, 14, 15], index=idx, columns=['quant1'])
In [9]: new_data
Out[9]:
quant1
2011-01-01 11
2011-01-02 12
2011-01-03 13
2011-01-04 14
2011-01-05 15
In [10]: test_data.loc['100', 'quant1'] = new_data
In [11]: test_data
Out[11]:
quant1 quant2
100 2011-01-01 NaN 2.0
2011-01-02 NaN 3.0
2011-01-03 NaN NaN
2011-01-04 NaN 6.0
2011-01-05 NaN 8.0
200 2011-01-01 1.0 2.0
2011-01-02 NaN 3.0
2011-01-03 4.0 NaN
2011-01-04 5.0 6.0
2011-01-05 7.0 8.0
为什么['100', 'quant1']
数据段填充了NaN
而不是new_data
的数字?
我发现使用
test_data.loc['100', 'quant1'] = new_data.values
可以工作,但是我想了解是什么使Pandas
做到了这一点。该子切片具有与新数据相同的维度,甚至具有相同的索引,因此,即使我确实怀疑这与索引/对齐有关,但我并不真正理解如何或为什么-我的期望是只要您使用与分配的索引完全相同的索引,就可以正常工作。
答案 0 :(得分:1)
因为熊猫将接收数据帧的索引与提供新数据的序列对齐。当它这样做时,它找不到要查找的相关索引。
test_data.loc['100', 'quant2']
的索引条目为('100', '2011-01-01')
,而new_data
的索引条目为'2011-01-01'
。那些不一样。
使用values
属性,并跳过试图对齐的熊猫
test_data.loc['100', 'quant1'] = new_data.values
test_data
quant1 quant2
100 2011-01-01 11.0 2.0
2011-01-02 12.0 3.0
2011-01-03 13.0 NaN
2011-01-04 14.0 6.0
2011-01-05 15.0 8.0
200 2011-01-01 1.0 2.0
2011-01-02 NaN 3.0
2011-01-03 4.0 NaN
2011-01-04 5.0 6.0
2011-01-05 7.0 8.0
使用pd.concat
添加索引级别
test_data.loc['100', 'quant1'] = pd.concat({'100': new_data})
test_data
quant1 quant2
100 2011-01-01 11.0 2.0
2011-01-02 12.0 3.0
2011-01-03 13.0 NaN
2011-01-04 14.0 6.0
2011-01-05 15.0 8.0
200 2011-01-01 1.0 2.0
2011-01-02 NaN 3.0
2011-01-03 4.0 NaN
2011-01-04 5.0 6.0
2011-01-05 7.0 8.0