RDD拆分提供缺少的参数类型

时间:2018-03-22 03:40:45

标签: hadoop apache-spark rdd

我正在尝试拆分最初从DF创建的RDD。不确定为什么会出错。

不写每个列名,但sql包含所有列名。所以,sql没什么问题。

val df = sql("SELECT col1, col2, col3,... from tableName")
rddF = df.toJavaRDD

rddFtake(1)
res46: Array[org.apache.spark.sql.Row] = Array([2017-02-26,100102-AF,100134402,119855,1004445,0.0000,0.0000,-3.3,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000,0.0000]

scala> rddF.map(x => x.split(","))
<console>:31: error: missing parameter type
       rdd3.map(x => x.split(","))

有关错误的任何想法?我正在使用Spark 2.2.0

1 个答案:

答案 0 :(得分:1)

您可以在top_grades中看到

rddF an Array of Row,并且当您拆分字符串时,您无法res46: Array[org.apache.spark.sql.Row] split / em>的

您可以执行以下操作

Row