我目前正在使用:
+---+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|id |sen |attributes |
+---+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1 |Stanford is good college.|[[Stanford,ORGANIZATION,NNP], [is,O,VBZ], [good,O,JJ], [college,O,NN], [.,O,.], [Stanford,ORGANIZATION,NNP], [is,O,VBZ], [good,O,JJ], [college,O,NN], [.,O,.]]|
+---+-------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
I want to get above df from :
+----------+--------+--------------------+
|article_id| sen| attribute|
+----------+--------+--------------------+
| 1|example1|[Standford,Organi...|
| 1|example1| [is,O,VP]|
| 1|example1| [good,LOCATION,ADP]|
+----------+--------+--------------------+
使用:
df3.registerTempTable("d1")
val df4 = sqlContext.sql("select article_id,sen,collect(attribute) as attributes from d1 group by article_id,sen")
有没有办法我不必注册临时表,因为在保存数据帧时,它会给你很多垃圾!有点df3。选择"" ??