我试图通过复制code here来学习Windows 10上的Spark GraphX。代码是使用较旧版本的Spark开发的,我无法找到创建顶点的解决方案。以下是代码
import scala.util.MurmurHash
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
val path = "F:/Soft/spark/2008.csv"
val df_1 = spark.read.option("header", true).csv(path)
val flightsFromTo = df_1.select($"Origin",$"Dest")
val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString))
// error caused by the following line
val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))
以下是错误消息:
<console>:57: error: missing parameter type
val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash.stringHash(x), x))
^
我认为语法已过时,我试图在official documents上找到最新的语法,但它没有任何帮助。数据集可以从here下载。
更新
基本上,我正在尝试创建一个Vertex和Edge,最终创建一个图形,如tutorial所示。我也是Map-Reduce范例的新手。
答案 0 :(得分:4)
以下代码行对我有用。
// imported latest library - works without this too, just gives a warning
import scala.util.hashing.MurmurHash3
// datasets are set to rdd - this is the cause of the error
val flightsFromTo = df_1.select($"Origin",$"Dest").rdd
val airportCodes = df_1.select($"Origin", $"Dest").flatMap(x => Iterable(x(0).toString, x(1).toString)).rdd
val airportVertices: RDD[(VertexId, String)] = airportCodes.distinct().map(x => (MurmurHash3.stringHash(x), x))
答案 1 :(得分:1)
您可以尝试: val airportVertices:RDD [(VertexId,String)] = airportCodes.distinct()。map(x =&gt;(MurmurHash.stringHash(x(0)),x(1)))
答案 2 :(得分:1)
//为了应用map(),只需尝试将变量转换为RDD。
val airportVertices:RDD [(VertexId,String)] = airportCodes.rdd.distinct()。map(x =&gt;(MurmurHash3.stringHash(x),x))