Scala:GraphX:error:class Array接受类型参数

时间:2018-03-06 02:56:57

标签: scala apache-spark spark-graphx

我正在尝试为GraphX构建Edge RDD。我正在读取csv文件并转换为DataFrame然后尝试转换为Edge RDD:

val staticDataFrame = spark.
  read.
  option("header", true).
  option("inferSchema", true).
  csv("/projects/pdw/aiw_test/aiw/haris/Customers_DDSW-withDN$.csv")

val edgeRDD: RDD[Edge[(VertexId, VertexId, String)]]  = 
  staticDataFrame.select(
    "dealer_customer_number",
    "parent_dealer_cust_number",
    "dealer_code"
  ).map{ (row: Array) => 
    Edge((
      row.getAs[Long]("dealer_customer_number"), 
      row.getAs[Long]("parent_dealer_cust_number"),
      row("dealer_code")
    ))
  }

但是我收到了这个错误:

<console>:81: error: class Array takes type parameters
       val edgeRDD: RDD[Edge[(VertexId, VertexId, String)]]  = staticDataFrame.select("dealer_customer_number", "parent_dealer_cust_number", "dealer_code").map((row: Array) => Edge((row.getAs[Long]("dealer_customer_number"), row.getAs[Long]("parent_dealer_cust_number"), row("dealer_code"))))
                                                                                                                                                                      ^

的结果
staticDataFrame.select("dealer_customer_number", "parent_dealer_cust_number", "dealer_code").take(1)

res3: Array[org.apache.spark.sql.Row] = Array([0000101,null,B110])

1 个答案:

答案 0 :(得分:0)

首先,/** Function 1 **/ $(function() { ... /** Function 2 **/ $(function() { ... /** Function 3 **/ $(function() { ... 采用类型参数,因此您必须编写Array。但这可能不是你想要的。

数据框是Array[Something],而不是Dataset[Row],因此您必须更改

Dataset[Array[_]]

.map{ (row: Array) => 

或者只是完全省略输入(应该推断):

.map{ (row: Row) =>