df.sqlContext.sql()无法识别数据库表

时间:2017-03-30 09:51:11

标签: sql-server scala apache-spark apache-spark-sql

我在spark env ::

中运行了以下代码
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.SQLContext
import sqlContext.implicits._
import java.util.Properties

val conf = new SparkConf().setAppName("test").setMaster("local").set("spark.driver.allowMultipleContexts", "true");
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val df = sqlContext.read.format("jdbc").option("url","jdbc:sqlserver://server_IP:port").option("databaseName","DB_name").option("driver","com.microsoft.sqlserver.jdbc.SQLServerDriver").option("dbtable","tbl").option("user","uid").option("password","pwd").load()

val df2 = df.sqlContext.sql("SELECT col1,col2 FROM tbl LIMIT 5")
exit()

当我尝试执行上面的代码时,我得到错误为“org.apache.spark.sql.AnalysisException:Table not found:tbl;”,但是,如果我删除df2,并执行代码,我可以成功查看表tbl的内容。有什么问题吗?我正在使用spark 1.6.1,所以我检查了文档,通过sqlcontext触发sql查询的语法是由我“https://spark.apache.org/docs/1.6.0/sql-programming-guide.html”正确放置的,请参考“以编程方式运行SQL查询”主题。

以下是完整跟踪错误::

中的唯一跟踪
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@5eea8854
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@7790a6fb
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@a9f4621
df: org.apache.spark.sql.DataFrame = [col1: int, col2: string, col3: string, col4: string, col5: string, col6: string, col7: string, col8: string, col9: timestamp, col10: timestamp, col11: string, col12: string]
org.apache.spark.sql.AnalysisException: Table not found: tbl;

1 个答案:

答案 0 :(得分:1)

代码中的df是一个DataFrame。

如果您想进行任何选择操作,请执行df.select()

如果您想使用sqlcontext.sql()执行查询,则首先将数据框注册为df.registerTempTable(tableName: String)的临时表。