Spark 2.1,scala:我正在将GDELT数据转换为GraphX格式。但是,使用MurmurHash3创建哈希值时,列出的here示例失败: 我不太了解scala类型来诊断此错误消息。
val eventsFromTo = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null")
eventsFromTo.show(5)
+-------------+----------+
| Actor1Name|Actor2Name|
+-------------+----------+
| SENATE| RUSSIAN|
| MEXICO| TEXAS|
| RUSSIAN| SENATE|
| VERMONT| CANADA|
|UNITED STATES| POLICE|
+-------------+----------+
only showing top 5 rows
val eventActors = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null").flatMap(x => Iterable(x(0).toString,x(1).toString))
eventActors.show(5)
+-------+
| value|
+-------+
| SENATE|
|RUSSIAN|
| MEXICO|
| TEXAS|
|RUSSIAN|
+-------+
然后我尝试将其转换为graphX:
val eventVertices: RDD[(VertexId, String)] = eventActors.distinct().map(x => (MurmurHash3.stringHash((x),x)))
<console>:265: error: missing parameter type
如果我为i添加了一个类型,那么我会收到此错误:
<console>:265: error: type mismatch;
found : String
required: Int
val eventVertices: RDD[(VertexId, String)] = eventActors.distinct().map((x:String) => (MurmurHash3.stringHash((x),x)))
答案 0 :(得分:0)
我失踪了&#34; .rdd&#34;在执行map()
之前将它们转换为RDDval eventsFromTo = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null").rdd
val eventActors = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null").flatMap(x => Iterable(x(0).toString,x(1).toString)).rdd