Question

我有一个购买了各种物品的用户的数据框。我想将该值列表分成不同的列，并为购买该项目的用户提供一个二进制标志。

输入：

       A           B
0  James  [123, 456]
1   Mary       [123]
2   John  [456, 789]

预期输出：

       A           B  123  456  789
0  James  [123, 456]    1    1    0
1   Mary       [123]    1    0    0
2   John  [456, 789]    0    1    1

我尝试过的事情（逐步）

df['B'].explode()是我的第一步：

使用get_dummies() pd.get_dummies(df['B'].explode())：

   123  456  789
0    1    0    0
0    0    1    0
1    1    0    0
2    0    1    0
2    0    0    1

Join一起在索引df.join(pd.get_dummies(df['B'].explode()))上

：

       A           B  123  456  789
0  James  [123, 456]    1    0    0
0  James  [123, 456]    0    1    0
1   Mary       [123]    1    0    0
2   John  [456, 789]    0    1    0
2   John  [456, 789]    0    0    1

问题：

现在，我只需要分组并合并。但是，随着成千上万的行和客户购买100多种产品，这种连接/组合方法效率很低。有没有做到这一点的“熊猫友好型”或内置函数呢？

Answer 1

您可以将> vinfo <- tools::getVignetteInfo("ggplot2") > vinfo Package Dir Topic [1,] "ggplot2" "C:/R/R-3.6.3/library/ggplot2" "ggplot2-specs" [2,] "ggplot2" "C:/R/R-3.6.3/library/ggplot2" "extending-ggplot2" [3,] "ggplot2" "C:/R/R-3.6.3/library/ggplot2" "ggplot2-in-packages" File Title [1,] "ggplot2-specs.Rmd" "Aesthetic specifications" [2,] "extending-ggplot2.Rmd" "Extending ggplot2" [3,] "ggplot2-in-packages.Rmd" "Using ggplot2 in packages" R PDF [1,] "ggplot2-specs.R" "ggplot2-specs.html" [2,] "extending-ggplot2.R" "extending-ggplot2.html" [3,] "ggplot2-in-packages.R" "ggplot2-in-packages.html" >替换为pd.get_dummies(df['B'].explode()并加入。

熊猫值列表到二元列

1 个答案: