SQLITE_ERROR:从Spark通过JDBC连接到SQLite数据库时,连接已关闭

时间:2015-09-30 07:16:42

标签: sqlite jdbc apache-spark apache-spark-sql

我正在使用Apache Spark 1.5.1并尝试连接到名为clinton.db的本地SQLite数据库。从数据库表创建数据框工作正常但是当我对创建的对象执行某些操作时,我得到下面的错误,其中显示“SQL错误或缺少数据库(连接已关闭)”。有趣的是,我得到了操作的结果。知道我能做些什么来解决问题,即避免错误吗?

启动spark-shell的命令:

../spark/bin/spark-shell --master local[8] --jars ../libraries/sqlite-jdbc-3.8.11.1.jar --classpath ../libraries/sqlite-jdbc-3.8.11.1.jar

从数据库中读取:

val emails = sqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlite:../data/clinton.sqlite", "dbtable" -> "Emails")).load()

简单计数(失败):

emails.count

错误:

15/09/30 09:06:39 WARN JDBCRDD: Exception closing statement java.sql.SQLException: [SQLITE_ERROR] SQL error or missing database (Connection is closed) at org.sqlite.core.DB.newSQLException(DB.java:890) at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109) at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1.org$apache$spark$sql$execution$datasources$jdbc$JDBCRDD$$anon$$close(JDBCRDD.scala:454) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$$anon$1$$anonfun$8.apply(JDBCRDD.scala:358) at org.apache.spark.TaskContextImpl$$anon$1.onTaskCompletion(TaskContextImpl.scala:60) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:79) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:77) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:77) at org.apache.spark.scheduler.Task.run(Task.scala:90) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) res1: Long = 7945

1 个答案:

答案 0 :(得分:1)

我得到了同样的错误today,重要的一行就在异常之前:

  

15/11/30 12:13:02 INFO jdbc.JDBCRDD:关闭连接

     

15/11/30 12:13:02 WARN jdbc.JDBCRDD:异常结束语句   java.sql.SQLException:[SQLITE_ERROR] SQL错误或缺少数据库(连接已关闭)       at org.sqlite.core.DB.newSQLException(DB.java:890)       at org.sqlite.core.CoreStatement.internalClose(CoreStatement.java:109)       at org.sqlite.jdbc3.JDBC3Statement.close(JDBC3Statement.java:35)       在org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD $$ anon $ 1.org $ apache $ spark $ sql $ execution $ datasources $ jdbc $ JDBCRDD $$ anon $$ close(JDBCRDD.scala:454)< / p>

因此Spark成功关闭JDBC 连接,然后无法关闭JDBC 语句

查看来源,close()被称为两次

第358行(org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD,Spark 1.5.1)

context.addTaskCompletionListener{ context => close() }

第469行

override def hasNext: Boolean = {
  if (!finished) {
    if (!gotNext) {
      nextValue = getNext()
      if (finished) {
        close()
      }
      gotNext = true
    }
  }
  !finished
}

如果查看close()方法(第443行)

def close() {
  if (closed) return

你可以看到它检查变量closed,但该值永远不会设置为true。

如果我看得正确,这个bug仍然在主人身上。我已经提交了bug report