我不确定输出数据帧查询中的first(“ traitvalue”)为什么在下面起作用。first(“ traitvalue”)在这里是什么意思?,请咨询
输入数据框:
val df = sc.parallelize(List(("1","NA","action","Heavy", "NY"),("1","NA","comedy","light", "NY"),("1","NA","horror","light", "NY"),("1","NA","horror","light", "KY"),("2","NA","horror","light", "NY"))).toDF("ban","yr_mon","genre","traitvalue","state")
+---+------+------+----------+-----+
|ban|yr_mon| genre|traitvalue|state|
+---+------+------+----------+-----+
| 1| NA|action| Heavy| NY|
| 1| NA|comedy| light| NY|
| 1| NA|horror| light| NY|
| 1| NA|horror| light| KY|
| 2| NA|horror| light| NY|
+---+------+------+----------+-----+
输出数据框
df.groupBy($"ban",$"state").pivot("genre").agg(first("traitvalue")).show
+---+-----+------+------+------+
|ban|state|action|comedy|horror|
+---+-----+------+------+------+
| 2| NY| null| null| light|
| 1| NY| Heavy| light| light|
| 1| KY| null| null| light|
+---+-----+------+------+------+
答案 0 :(得分:0)
这只是一个小技巧,因为该示例使用带枢轴的agg而不是数值函数。使用分类值,您可能会遇到多个这样的条目,因此它将采用第一个这样的条目。通常没有这样的问题。例如。两个特质。因此,这种方法。