我正在尝试编译以下代码:
// Imports
import org.apache.spark.sql.{Row, SQLContext, SparkSession}
import org.apache.spark.sql.types._
import org.apache.spark.{SparkConf, SparkContext}
...
// Initialization
val conf = new SparkConf().setAppName("spark-test").setMaster("local")
val sc = new SparkContext(conf)
val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()
import sparkSession.implicits._
...
val sqlContext = sparkSession
val dfPlayersT = sqlContext.createDataFrame(nPlayer,schemaN)
dfPlayersT.createOrReplaceTempView("tPlayers")
val dfPlayers = sqlContext.sql("select age-min_age as exp,tPlayers.* from
tPlayers join (select name,min(age)as min_age from tPlayers group by name)
as t1 on tPlayers.name=t1.name order by tPlayers.name, exp ")
val pStats = dfPlayers.sort(dfPlayers("name"),dfPlayers("exp").asc)
.map(x=>(x.getString(1),(x.getDouble(50),x.getDouble(40),x.getInt(2),
x.getInt(3),Array(x.getDouble(31),x.getDouble(32),x.getDouble(33),
x.getDouble(34),x.getDouble(35),x.getDouble(36),x.getDouble(37),
x.getDouble(38),x.getDouble(39)),x.getInt(0))))
.groupByKey() // Error
但是出现错误:
错误:(217,57)用替代方法重载了方法值groupByKey: [K](func:org.apache.spark.api.java.function.MapFunction [(String,(Double,Double,Int,Int,Array [Double],Int)),K],编码器:org.apache。 spark.sql.Encoder [K])org.apache.spark.sql.KeyValueGroupedDataset [K,(String,(Double,Double,Int,Int,Array [Double],Int))]] [K](func:(((String,(Double,Double,Int,Int,Array [Double],Int)))=> K)(隐式证据$ 4:org.apache.spark.sql.Encoder [K]) org.apache.spark.sql.KeyValueGroupedDataset [K,(String,(Double,Double,Int,Int,Array [Double],Int))] 不能应用于() x.getDouble(38),x.getDouble(39)),x.getInt(0))))。groupByKey()
这是我的build.sbt文件:
name := "ScalaHello"
version := "0.1"
scalaVersion := "2.12.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.2"
libraryDependencies += "org.apache.spark" %% "spark-catalyst" % "2.4.2"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.2"
我认为问题出在sparkSession的初始化上,但是找不到错误所在。
答案 0 :(得分:0)
应该
.groupByKey(_._1)
或
.groupByKey(_._2._1)
或
.groupByKey(_._2._2)
...
或
.groupByKey(_._2._6)
?