Question

我的数据框df具有以下结构：

product_id  url                 type
0   2013367 7405e0c483323f78b   A
1   2013367 ea919d2276f60f31e   B
2   452998  117312244aa203a03   A
3   452998  1a6a41a6141235d68   B
4   2196333 cd66f91431fbae2d4   A

我正在尝试使用pandas pivot函数来重构数据框，如下所示：

product_id   A                  B
2013367      7405e0c483323f78b  ea919d2276f60f31e   
452998       117312244aa203a03  1a6a41a6141235d68   
2196333      cd66f91431fbae2d4  NaN

我使用df.pivot(index="product_id", columns="type",values='url')的文档（https://pandas-docs.github.io/pandas-docs-travis/reshaping.html）

但是，出现以下错误：

ValueError：索引包含重复的条目，无法重塑

我在这里（How to pivot categorical variable in pandas?）发现了一个类似的问题，其中解决方案涉及到日期时间格式的转换。但是，我没有使用日期作为索引。

我该如何解决？

Answer 1

好吧，我刚刚发现问题是由于我的数据集中存在多次与type A相关联的product_id。像这样：

product_id  url                 type
0   2013367 7405e0c483323f78b   A
1   2013367 ea919d2276f60f31e   B
2   452998  117312244aa203a03   A < ---- same id and type but different url
3   452998  1a6a41a6141235d68   A < ---- same id and type but different url
4   2196333 cd66f91431fbae2d4   A

因此，熊猫不知道要分配哪个值，从而导致出现上述错误。

解决方案是在drop_duplicates之前使用pivot，例如：df.drop_duplicates(subset=["product_id","type"],inplace=True)

重塑和数据透视表-ValueError：索引包含重复的条目，无法重塑

1 个答案: