我正在执行查询以生成Spark数据框。
val a= hc.sql("describe extended spark_test")
hc- hiveContext spark_test-表名
scala> a.show
+--------------------+--------------------+-------+
| col_name| data_type|comment|
+--------------------+--------------------+-------+
| id| int| null|
| name| string| null|
|Detailed Table In...|Table(tableName:s...| |
+--------------------+--------------------+-------+
我的主要目标是从col_name详细表信息下的data_type列中获取某些表参数值。
val b = a.filter($“ col_name” ===“详细表 信息”)。select($“ data_type”)
b.show(false)
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|data_type |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Table(tableName:spark_test, dbName:default, owner:null, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/home/cloudera/spark_test.txt, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim=,, serialization.format=,}), bucketCols:null, sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1549272617}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
我想从输出中读取dbName,owner参数并对其进行进一步处理。 (例如:dbName应该是默认值)
请提出一种合适的方法。我正在使用Spark 1.6
感谢您的帮助。.谢谢!