我正在使用Spark的FP-growth算法。我在进行收集时遇到OOM错误,然后我更改了代码,以便将结果保存在HDFS上的文本文件中,而不是在驱动程序节点上收集它们。以下是相关代码:
//建模:
val fpg = new FPGrowth()
.setMinSupport(0.01)
.setNumPartitions(10)
val model = fpg.run(transaction_distinct)
这是一个应该给我RDD [Strings]的转换。
val mymodel = model.freqItemsets.map { itemset =>
val model_res = itemset.items.mkString("[", ",", "]") + ", " + itemset.freq
model_res
}
然后我将模型结果保存为。不幸的是,这真的很慢!
mymodel.saveAsTextFile("fpm_model")
我收到这些错误:
16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError[akka.tcp://sparkDriver@ipaddress:46811] -> [akka.tcp://sparkExecutor@hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor@hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@hostname:39720]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720] akka.event.Logging$Error$NoCause$
16/02/04 14:47:28 INFO BlockManagerMasterEndpoint: Removing block manager BlockManagerId(3, hostname, 58683)
16/02/04 14:47:28 INFO BlockManagerMaster: Removed 3 successfully in removeExecutor
16/02/04 14:47:28 ERROR ErrorMonitor: AssociationError [akka.tcp://sparkDriver@ipaddress:46811] ->[akka.tcp://sparkExecutor@hostname:39720]: Error [Association failed with [akka.tcp://sparkExecutor@hostname:39720]][akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkExecutor@hostname:39720]
Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: hostname/ipaddress:39720