如果我具有如下创建的数据框:
df = spark.table("tblName")
反正我可以从df找回tblName吗?
答案 0 :(得分:0)
您可以从计划中提取它:
df.logicalPlan().argString().replace("`","")
答案 1 :(得分:0)
我们可以通过解析unresolved logical plan
从数据框中提取表名。
请按照以下方法操作:
def getTableName(df: DataFrame): String = {
Seq(df.queryExecution.logical, df.queryExecution.optimizedPlan).flatMap{_.collect{
case LogicalRelation(_, _, catalogTable: Option[CatalogTable], _) =>
if (catalogTable.isDefined) {
Some(catalogTable.get.identifier.toString())
} else None
case hive: HiveTableRelation => Some(hive.tableMeta.identifier.toString())
}
}.flatten.head
}
scala> val df = spark.table("db.table")
scala> getTableName(df)
res: String = `db`.`table`
答案 2 :(得分:-1)
您可以从df创建表。但是,如果表是本地临时视图或全局临时视图,则应在创建具有相同名称的表之前删除它(sqlContext.dropTempTable),或使用create或replace函数(spark.createOrReplaceGlobalTempView或spark.createOrReplaceTempView)。如果表是临时表,则可以创建具有相同名称的表而不会出错
#Create data frame
>>> d = [('Alice', 1)]
>>> test_df = spark.createDataFrame(sc.parallelize(d), ['name','age'])
>>> test_df.show()
+-----+---+
| name|age|
+-----+---+
|Alice| 1|
+-----+---+
#create tables
>>> test_df.createTempView("tbl1")
>>> test_df.registerTempTable("tbl2")
>>> sqlContext.tables().show()
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| | tbl1| true|
| | tbl2| true|
+--------+---------+-----------+
#create data frame from tbl1
>>> df = spark.table("tbl1")
>>> df.show()
+-----+---+
| name|age|
+-----+---+
|Alice| 1|
+-----+---+
#create tbl1 again with using df data frame. It will get error
>>> df.createTempView("tbl1")
raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: "Temporary view 'tbl1' already exists;"
#drop and create again
>>> sqlContext.dropTempTable('tbl1')
>>> df.createTempView("tbl1")
>>> spark.sql('select * from tbl1').show()
+-----+---+
| name|age|
+-----+---+
|Alice| 1|
+-----+---+
#create data frame from tbl2 and replace name value
>>> df = spark.table("tbl2")
>>> df = df.replace('Alice', 'Bob')
>>> df.show()
+----+---+
|name|age|
+----+---+
| Bob| 1|
+----+---+
#create tbl2 again with using df data frame
>>> df.registerTempTable("tbl2")
>>> spark.sql('select * from tbl2').show()
+----+---+
|name|age|
+----+---+
| Bob| 1|
+----+---+
答案 3 :(得分:-1)
您可以对其进行explain
检索物理计划,该计划将为您提供可用于检索原始表名的信息
scala> val df = sqlContext.table("testtable")
df: org.apache.spark.sql.DataFrame = [id: bigint, name: string, ssn: string]
scala> df.explain
== Physical Plan ==
Scan ParquetRelation: default.testtable[id#0L,name#1,ssn#2] InputPaths: hdfs://user/hive/warehouse/testtable
或
== Physical Plan ==
HiveTableScan [id#0L,name#1,ssn#2], MetastoreRelation hive_sample_db, testtable, None
一旦您将物理计划作为字符串,只需对其进行操作即可恢复原来的表名。