我正在尝试为GraphX构建Edge RDD。我正在读取csv文件并转换为DataFrame然后尝试转换为Edge RDD:
val staticDataFrame = spark.
read.
option("header", true).
option("inferSchema", true).
csv("/projects/pdw/aiw_test/aiw/haris/Customers_DDSW-withDN$.csv")
val edgeRDD: RDD[Edge[(VertexId, VertexId, String)]] =
staticDataFrame.select(
"dealer_customer_number",
"parent_dealer_cust_number",
"dealer_code"
).map{ (row: Array) =>
Edge((
row.getAs[Long]("dealer_customer_number"),
row.getAs[Long]("parent_dealer_cust_number"),
row("dealer_code")
))
}
但是我收到了这个错误:
<console>:81: error: class Array takes type parameters
val edgeRDD: RDD[Edge[(VertexId, VertexId, String)]] = staticDataFrame.select("dealer_customer_number", "parent_dealer_cust_number", "dealer_code").map((row: Array) => Edge((row.getAs[Long]("dealer_customer_number"), row.getAs[Long]("parent_dealer_cust_number"), row("dealer_code"))))
^
的结果
staticDataFrame.select("dealer_customer_number", "parent_dealer_cust_number", "dealer_code").take(1)
是
res3: Array[org.apache.spark.sql.Row] = Array([0000101,null,B110])
答案 0 :(得分:0)
首先,/** Function 1 **/
$(function() {
...
/** Function 2 **/
$(function() {
...
/** Function 3 **/
$(function() {
...
采用类型参数,因此您必须编写Array
。但这可能不是你想要的。
数据框是Array[Something]
,而不是Dataset[Row]
,因此您必须更改
Dataset[Array[_]]
到
.map{ (row: Array) =>
或者只是完全省略输入(应该推断):
.map{ (row: Row) =>