我是Spark的新手并尝试在txt文件中编写分组数据,但我收到以下错误:
Error:(55, 31) value write is not a member of org.apache.spark.sql.RelationalGroupedDataset
代码段是 -
val dfyearlyGamesSelect = dfFiltered.select($"release_year",$"title")
val dfyearlyGroup = dfyearlyGamesSelect.groupBy($"release_year")
val dfWrite = dfyearlyGroup.write
.format("com.databricks.spark.csv")
.option("header","true")
.save(outputPath)
预期产量 - 每年,得分最高的游戏名称。(column-year_release,title,score)
示例数据 -
,score_phrase,title,url,platform,score,genre,editors_choice,release_year,release_month,release_day
0,Amazing,LittleBigPlanet PS Vita,/games/littlebigplanet-vita/vita-98907,PlayStation Vita,9.0,Platformer,Y,2012,9,12
1,Amazing,LittleBigPlanet PS Vita -- Marvel Super Hero Edition,/games/littlebigplanet-ps-vita-marvel-super-hero-edition/vita-20027059,PlayStation Vita,9.0,Platformer,Y,2012,9,12
2,Great,Splice: Tree of Life,/games/splice/ipad-141070,iPad,8.5,Puzzle,N,2012,9,12
3,Great,NHL 13,/games/nhl-13/xbox-360-128182,Xbox 360,8.5,Sports,N,2012,9,11
4,Great,NHL 13,/games/nhl-13/ps3-128181,PlayStation 3,8.5,Sports,N,2012,9,11
5,Good,Total War Battles: Shogun,/games/total-war-battles-shogun/mac-142565,Macintosh,7.0,Strategy,N,2012,9,11
6,Awful,Double Dragon: Neon,/games/double-dragon-neon/xbox-360-131320,Xbox 360,3.0,Fighting,N,2012,9,11
7,Amazing,Guild Wars 2,/games/guild-wars-2/pc-896298,PC,9.0,RPG,Y,2012,9,11
答案 0 :(得分:0)
write
不提供GroupedData
。您必须应用聚合函数来获取可以写入HDFS的Dataframe。
val dfYearlyGroup = dfyearlyGamesSelect.groupBy($"release_year")
.agg( first($"title") as "title" )
现在dfYearlyGroup
将是一个Dataframe,您可以将其写入HDFS。此外,您不必将其存储在变量中,因为它不会返回任何内容。
dfyearlyGroup.write
.format("com.databricks.spark.csv")
.option("header","true")
.save(outputPath)
对于您的用例,您可以使用窗口函数rank
或rownum
,具体取决于您是否需要多行来获得相同的分数。
import org.apache.spark.sql.expressions.Window
df.select($"release_year", $"title", $"score").show(false)
+------------+----------------------------------------------------+-----+
|release_year|title |score|
+------------+----------------------------------------------------+-----+
|2012 |LittleBigPlanet PS Vita |9.0 |
|2012 |LittleBigPlanet PS Vita -- Marvel Super Hero Edition|9.0 |
|2012 |Splice: Tree of Life |8.5 |
|2012 |NHL 13 |8.5 |
|2012 |NHL 13 |8.5 |
|2012 |Total War Battles: Shogun |7.0 |
|2012 |Double Dragon: Neon |3.0 |
|2012 |Guild Wars 2 |9.0 |
+------------+----------------------------------------------------+-----+
val w = Window.partitionBy($"release_year").orderBy($"score".desc)
val dfYearlyMaxScore = df.withColumn("rank", rank over w)
.where($"rank" === lit(1) )
.select($"release_year", $"title", $"score")
dfYearlyMaxScore.show(false)
+------------+----------------------------------------------------+-----+
|release_year|title |score|
+------------+----------------------------------------------------+-----+
|2012 |LittleBigPlanet PS Vita |9.0 |
|2012 |LittleBigPlanet PS Vita -- Marvel Super Hero Edition|9.0 |
|2012 |Guild Wars 2 |9.0 |
+------------+----------------------------------------------------+-----+
现在,您可以使用以下方式编写它:
dfYearlyMaxScore.write
.format("com.databricks.spark.csv")
.option("header","true")
.save(outputPath)