Question

有一个数据框，我想更新一系列列的子集，其长度与正在更新的列数相同：

>>> df = pd.DataFrame(np.random.randint(0,5,(6, 2)), columns=['col1','col2'])
>>> df

   col1  col2
0     1     0
1     2     4
2     4     4
3     4     0
4     0     0
5     3     1

>>> df.loc[:,['col1','col2']] = pd.Series([0,1])
...
ValueError: shape mismatch: value array of shape (6,) could not be broadcast to indexing result of shape (2,6)

它失败了，但是，我能够使用list做同样的事情：

>>> df.loc[:,['col1','col2']] = list(pd.Series([0,1]))
>>> df
   col1  col2
0     0     1
1     0     1
2     0     1
3     0     1
4     0     1
5     0     1

你可以帮我理解，为什么系列更新失败了？我必须进行一些特殊的重塑吗？

Answer 1

当使用pandas对象进行分配时，pandas会更“严格”地处理赋值。大熊猫分配的大熊猫必须通过更严格的协议。只有当你把它变成一个列表（或者等同于pd.Series([0, 1]).values）时，大熊猫才会让步并允许你按照你想象的方式进行分配。

更高的任务标准要求指数也排成一行，所以即使你的形状正确，如果没有正确的指数，它仍然无法正常工作。

df.loc[:, ['col1', 'col2']] = pd.DataFrame([[0, 1] for _ in range(6)])
df

df.loc[:, ['col1', 'col2']] = pd.DataFrame([[0, 1] for _ in range(6)], columns=['col1', 'col2'])
df

用系列更新数据帧

1 个答案: