Spark:频繁模式挖掘:保存结果的问题

时间:2016-02-04 22:56:51

标签: apache-spark apache-spark-mllib

我正在使用Spark的FP-growth算法。我在进行收集时遇到OOM错误,然后我更改了代码,以便将结果保存在HDFS上的文本文件中,而不是在驱动程序节点上收集它们。以下是相关代码:

//建模:

val fpg = new FPGrowth()
  .setMinSupport(0.01)
  .setNumPartitions(10)
val model = fpg.run(transaction_distinct)

这是一个应该给我RDD [Strings]的转换。

val mymodel = model.freqItemsets.map { itemset =>
  val model_res = itemset.items.mkString("[", ",", "]") + ", " + itemset.freq
  model_res
}

然后我将模型结果保存为。不幸的是,这真的很慢!

mymodel.saveAsTextFile("fpm_model")

我收到这些错误:

16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError[akka.tcp://sparkDriver@ipaddress:46811] -> [akka.tcp://sparkExecutor@hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor@hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@hostname:39720]

Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720] akka.event.Logging$Error$NoCause$
16/02/04 14:47:28 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, hostname, 58683)
16/02/04 14:47:28 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@ipaddress:46811] ->[akka.tcp://sparkExecutor@hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor@hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@hostname:39720]

Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720

0 个答案:

没有答案