我有一个数据框:
customer | Department
----------------------
A | Food
B | Home
A | Office
C | Home
A | Home
B | Office
Customer和Department列都是字符串类型
如何将不同类型的部门转换为新列,例如一个热矢量,以便创建如下所示的新数据框:
customer | Food | Home | Office
-----------------------------------
A 1 1 1
B 0 1 1
C 0 1 0
此处Food
,Home
,Office
列为整数类型,customer
为String
类型。
答案 0 :(得分:2)
您只需要group
和category
pivot
数据,汇总为
val df = Seq(
("A", "Food"),
("B", "Home"),
("A", "Office"),
("C", "Home"),
("A", "Home"),
("B", "Office")
).toDF("customer", "department")
df.groupBy("customer").pivot("department").agg(count("department"))
.na.fill(0)
输出:
+--------+----+----+------+
|customer|Food|Home|Office|
+--------+----+----+------+
|B |0 |1 |1 |
|C |0 |1 |0 |
|A |1 |1 |1 |
+--------+----+----+------+