为什么使用组合给出“java.io.NotSerializableException:scala.collection.TraversableOnce $ FlattenOps $$ anon $ 1”?

时间:2017-05-06 16:14:41

标签: scala apache-spark

问题:

我有Array[Array[String]]形式的RDD,我需要在内部数组中组合字符串。但是当我应用地图功能时,我收到以下错误

java.io.NotSerializableException: scala.collection.TraversableOnce$FlattenOps$$anon$1
Serialization stack:
    - object not serializable (class: scala.collection.TraversableOnce$FlattenOps$$anon$1, value: non-empty iterator)
    - element of array (index: 0)
    - array (class [Lscala.collection.Iterator;, size 10)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:324)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

背景

最初我有以下内容:

Array[org.apache.spark.sql.Row] = Array([cyber crimes ;; cyber security ;; review ;; india ;; instances ;; state ;; issue], [civil rights ;; case ;; instances ;; frequency])

当我使用以下代码清理时:

words.map(r => r(0).asInstanceOf[String].split("\\;;").map(_.trim))

结果如下:

Array[Array[String]] = Array(Array(cyber crimes, cyber security, review, india, instances, state, issue), Array(civil society, instances, frequency))

现在我需要所有可能的数组中的字符串组合,如:

Array[Array[String]] = Array(Array((cyber crimes, cyber security), (review, india), (instances, state), (issue,cyber crimes))....etc)

当我对此应用map时,它会给我上述错误:

val combinations = cleanwords.map(r => r(0).asInstanceOf[String].combinations(2))

任何人都可以帮助我获得这个理想的结果吗?

1 个答案:

答案 0 :(得分:1)

发生错误可能是因为尝试收集元素为迭代器的rdd(由combinations生成)。此外,您需要直接在数组上使用combinations

cleanwords.map(_.combinations(2).toArray).collect
// res47: Array[Array[Array[String]]] = Array(Array(Array(cyber crimes, cyber security), Array(cyber crimes, review), Array(cyber crimes, india) ..

要取回元组:

cleanwords.map(_.combinations(2).map(x => (x(0), x(1))).toArray).collect
// res60: Array[Array[(String, String)]] = Array(Array((cyber crimes,cyber security), (cyber crimes,review), (cyber crimes,india) ..