Spark SQL在MS SQL中使用max函数下推查询问题

时间:2018-01-04 11:54:41

标签: sql sql-server apache-spark jdbc apache-spark-sql

我想在驻留在MS SQL中的表的ID列上执行聚合函数MAX。我正在使用spark SQL 1.6和JDBC push down_query方法,因为我不希望spark SQL在spark方面提取所有数据并进行MAX (ID)计算,但是当我执行下面的代码时,我得到的是异常,而如果我在代码中尝试SELECT * FROM,它会按预期工作。

代码:

  def getMaxID(sqlContext: SQLContext,tableName:String) =
    {
      val pushdown_query = s"(SELECT MAX(ID) FROM ${tableName}) as t"
      val maxID = sqlContext.read.jdbc(url = getJdbcProp(sqlContext.sparkContext).toString, table = pushdown_query, properties = getDBConnectionProperties(sqlContext.sparkContext))
                   .head().getLong(0)
      maxID
    }

例外:

Exception in thread "main" java.sql.SQLException: No column name was specified for column 1 of 't'.
    at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:372)
    at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988)
    at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2421)
    at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:671)
    at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:505)
    at net.sourceforge.jtds.jdbc.JtdsPreparedStatement.executeQuery(JtdsPreparedStatement.java:1029)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:124)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)

1 个答案:

答案 0 :(得分:2)

此异常与Spark无关。您必须为列

提供别名
val pushdown_query = s"(SELECT MAX(ID) AS max_id FROM ${tableName}) as t"