Question

我运行一个Spark应用程序，它接受一个mysql表（3000万行），并找到模式。

val sqlContext = new SQLContext(sc)

sqlContext
  .read.format("jdbc").option("driver", "com.mysql.jdbc.Driver")
  .option("url", url)
  .option("dbtable", "MyTable")
  .option("user", "MyUser").option("password", "MyPwd")
  .load().registerTempTable("MyTable")

def getPattern(patent: String) = patent.replaceAll("\\d", "d")

sqlContext.sql("select distinct(code) from MyTable")
  .map(_.getString(0)) 
  .groupBy(getPattern)
  .mapValues(_.size)
  .saveAsTextFile("/tmp/result")

我正在localhost中运行该应用程序。应用程序失败：

2016-09-30 15:11:47,812 [dispatcher-event-loop-4] WARN  HeartbeatReceiver - Removing executor driver with no recent heartbeats: 129149 ms exceeds timeout 120000 ms
2016-09-30 15:11:47,819 [dispatcher-event-loop-4] ERROR TaskSchedulerImpl - Lost executor driver on localhost: Executor heartbeat timed out after 129149 ms

这是为什么？这是一个记忆问题吗？

mysql连接有问题吗？

我知道我可以将.groupBy(getPattern).mapValues(_.size)替换为.map(getPattern).countByValue，从而提高效率并避免错误。但是，这个问题是关于理解错误。

Spark：localhost上丢失的执行程序驱动程序

0 个答案: