使用边/顶点输入文件在GraphX中创建图形时出错

时间:2015-11-06 12:32:09

标签: scala apache-spark rdd spark-graphx

我在Spark graphX中运行以下代码创建图表时遇到错误。我通过以下命令通过spark-shell运行它: ./bin/spark-shell -i ex.scala

输入:

My Vertex File looks like this (each line is a vertex of strings):
word1,word2,word3
word1,word2,word3
...
My Edge File looks like this: (edge from vertex 1 to vertex 2)
1,2
1,3

代码:

// Creating Vertex RDD (Input file has 300+ records with each record having list of strings separated by delimiter (,).
//zipWithIndex done to get an index number for all the entries - basically numbering rows
val vRDD: RDD[(VertexId, Array[String])] = (vfile.map(line => line.split(","))).zipWithIndex().map(line => (line._2, line._1))

// Creating Edge RDD using input file
//val eRDD: RDD[Edge[Array[String]]] = (efile.map(line => line.split(",")))

val eRDD: RDD[(VertexId, VertexId)] = efile.map(line => line.split(","))

// Graph creation
val graph = Graph(vRDD, eRDD)

错误:

Error:
<console>:52: error: type mismatch;
found   : Array[String]
required: org.apache.spark.graphx.Edge[Array[String]]
          val eRDD: RDD[Edge[Array[String]]] = (efile.map(line =>    line.split(",")))

<console>:57: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(org.apache.spark.graphx.VertexId,   org.apache.spark.graphx.VertexId)]
required: org.apache.spark.rdd.RDD[org.apache.spark.graphx.Edge[?]]
Error occurred in an application involving default arguments.
       val graph = Graph(vRDD, eRDD)

2 个答案:

答案 0 :(得分:1)

Edgeattr - 您attr的类型是什么?我们假设它是Int,让它初始化为零:

而不是:

val eRDD: RDD[(VertexId, VertexId)] = efile.map(line => line.split(","))

试试这个:

val eRDD: RDD[Edge[Int]] = efile.map{ line => 
  val vs = line.split(",");
  Edge(vs(0).toLong, vs(1).toLong, 0)
}

答案 1 :(得分:0)

根据您给出的示例,我创建了两个带顶点和边的文件:

 
val vfile = sc.textFile("vertices.txt")
val efile = sc.textFile("edges.txt")

然后创建顶点和边的RDD:

 
val vRDD: RDD[(VertexId, Array[String])] = vfile.map(line => line.split(","))
                               .zipWithIndex()
                               .map(_.swap) // you can use swap here instead of what you are actually doing.

// Creating Edge RDD using input file
val eRDD: RDD[Edge[(VertexId, VertexId)]] = efile.map(line => {
  line.split(",", 2) match {
    case Array(n1, n2) => Edge(n1.toLong, n2.toLong)
  }
})

创建顶点和边缘RDD后,您现在可以创建图形:

 
val graph = Graph(vRDD, eRDD)