Question

Pandas中真正简单的任务是抛出一个我不明白的错误。使用这样的简单数据集：

test=pd.DataFrame([[1,3],[1,6],[2,4],[3,9],[3,2]],columns=['a','b'])

我可以执行以下操作来计算值在测试的“a”列中出现的次数。

test['count']=test.groupby('a').transform('count')

这会产生：

>>> test
       a  b  count
    0  1  3      2
    1  1  6      2
    2  2  4      1
    3  3  9      2
    4  3  2      2

完美。但是根据我的真实数据，这不起作用。这是我的数据的一小部分，用于重现问题：

newtest=pd.DataFrame([['010010201001000','001','0220','AL','0'],['010010201001001','001','0220','AL','0'],['010010201001002','001','0220','AL','0'],['010010201001003','001','0160','AL','0'],['010010201001004','001','0160','AL','0']],columns=['BlockID','CountyFP','District','state_x','HD'])
newtest['blocks']=newtest.groupby(['CountyFP','District','state_x']).transform('count')

试着给我这个错误：

ValueError: Wrong number of items passed 2, placement implies 1

我真的没有看到是什么让我的'真实'例子与剧集有任何不同，并且谷歌搜索这个错误会产生错误的其他例子，但我仍然不清楚为什么在这里制作它。

更令人困惑的是，如果我只是执行上面代码的右侧，它可以正常工作 - 在每一列中生成带有计数的newtest。所以这就像赋值给它带来了问题一样。

Answer 1

您没有选择任何列来执行聚合，因此它会在剩余的2列上执行此操作，如果您选择其中一列，则会得到所需的结果：

In [6]:
newtest['blocks'] = newtest.groupby(['CountyFP','District','state_x'])['BlockID'].transform('count')
newtest

Out[6]:
           BlockID CountyFP District state_x HD  blocks
0  010010201001000      001     0220      AL  0       3
1  010010201001001      001     0220      AL  0       3
2  010010201001002      001     0220      AL  0       3
3  010010201001003      001     0160      AL  0       2
4  010010201001004      001     0160      AL  0       2

你的尝试输出：

In [9]:
newtest.groupby(['CountyFP','District','state_x']).transform('count')

Out[9]:
   BlockID  HD
0        3   3
1        3   3
2        3   3
3        2   2
4        2   2

您可以看到它生成了2列，因为这些是剩余的列，因此您会看到错误消息。

Pandas grouby和transform（'count'）给出放置错误 - 在较小的数据集上工作正常

1 个答案: