无法解析重载的方法“ groupByKey”

时间:2019-04-28 20:01:20

标签: scala apache-spark intellij-idea apache-spark-sql bigdata

我正在尝试编译以下代码:

// Imports
import org.apache.spark.sql.{Row, SQLContext, SparkSession}
import org.apache.spark.sql.types._
import org.apache.spark.{SparkConf, SparkContext}

...

// Initialization
val conf = new SparkConf().setAppName("spark-test").setMaster("local")
val sc = new SparkContext(conf)
val sparkSession = SparkSession.builder.config(sc.getConf).getOrCreate()
import sparkSession.implicits._

...

val sqlContext = sparkSession
val dfPlayersT = sqlContext.createDataFrame(nPlayer,schemaN)

dfPlayersT.createOrReplaceTempView("tPlayers")
val dfPlayers = sqlContext.sql("select age-min_age as exp,tPlayers.* from 
  tPlayers join (select name,min(age)as min_age from tPlayers group by name) 
     as t1 on tPlayers.name=t1.name order by tPlayers.name, exp ")


val pStats = dfPlayers.sort(dfPlayers("name"),dfPlayers("exp").asc)
  .map(x=>(x.getString(1),(x.getDouble(50),x.getDouble(40),x.getInt(2),
    x.getInt(3),Array(x.getDouble(31),x.getDouble(32),x.getDouble(33),
    x.getDouble(34),x.getDouble(35),x.getDouble(36),x.getDouble(37),
    x.getDouble(38),x.getDouble(39)),x.getInt(0))))
    .groupByKey()  // Error

但是出现错误:

错误:(217,57)用替代方法重载了方法值groupByKey:   [K](func:org.apache.spark.api.java.function.MapFunction [(String,(Double,Double,Int,Int,Array [Double],Int)),K],编码器:org.apache。 spark.sql.Encoder [K])org.apache.spark.sql.KeyValueGroupedDataset [K,(String,(Double,Double,Int,Int,Array [Double],Int))]]   [K](func:(((String,(Double,Double,Int,Int,Array [Double],Int)))=> K)(隐式证据$ 4:org.apache.spark.sql.Encoder [K]) org.apache.spark.sql.KeyValueGroupedDataset [K,(String,(Double,Double,Int,Int,Array [Double],Int))]  不能应用于()         x.getDouble(38),x.getDouble(39)),x.getInt(0))))。groupByKey()

这是我的build.sbt文件:

name := "ScalaHello"

version := "0.1"

scalaVersion := "2.12.8"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.2"
libraryDependencies += "org.apache.spark" %% "spark-catalyst" % "2.4.2"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.2"

我认为问题出在sparkSession的初始化上,但是找不到错误所在。

1 个答案:

答案 0 :(得分:0)

应该

.groupByKey(_._1)

.groupByKey(_._2._1) 

.groupByKey(_._2._2)

...

.groupByKey(_._2._6)