spark:220:错误:缺少“map”的参数类型

时间:2017-06-10 00:17:10

标签: scala apache-spark

Spark 2.1,scala:我正在将GDELT数据转换为GraphX格式。但是,使用MurmurHash3创建哈希值时,列出的here示例失败: 我不太了解scala类型来诊断此错误消息。

 val eventsFromTo = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null")
    eventsFromTo.show(5)
+-------------+----------+
|   Actor1Name|Actor2Name|
+-------------+----------+
|       SENATE|   RUSSIAN|
|       MEXICO|     TEXAS|
|      RUSSIAN|    SENATE|
|      VERMONT|    CANADA|
|UNITED STATES|    POLICE|
+-------------+----------+
only showing top 5 rows

    val eventActors = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null").flatMap(x => Iterable(x(0).toString,x(1).toString))
    eventActors.show(5)
+-------+
|  value|
+-------+
| SENATE|
|RUSSIAN|
| MEXICO|
|  TEXAS|
|RUSSIAN|
+-------+

然后我尝试将其转换为graphX:

val eventVertices: RDD[(VertexId, String)] = eventActors.distinct().map(x => (MurmurHash3.stringHash((x),x)))
<console>:265: error: missing parameter type

如果我为i添加了一个类型,那么我会收到此错误:

<console>:265: error: type mismatch;
 found   : String
 required: Int
       val eventVertices: RDD[(VertexId, String)] = eventActors.distinct().map((x:String) => (MurmurHash3.stringHash((x),x)))

1 个答案:

答案 0 :(得分:0)

我失踪了&#34; .rdd&#34;在执行map()

之前将它们转换为RDD
val eventsFromTo = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null").rdd
val eventActors = gdelt.select("Actor1Name","Actor2Name").where("actor1Name is not null and actor2name is not null").flatMap(x => Iterable(x(0).toString,x(1).toString)).rdd