筛选数据框中列的值

时间:2019-02-04 13:42:20

标签: apache-spark apache-spark-sql

我正在执行查询以生成Spark数据框。

val a= hc.sql("describe extended spark_test")

hc- hiveContext spark_test-表名

scala> a.show

+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                  id|                 int|   null|
|                name|              string|   null|
|Detailed Table In...|Table(tableName:s...|       |
+--------------------+--------------------+-------+

我的主要目标是从col_name详细表信息下的data_type列中获取某些表参数值。

  

val b = a.filter($“ col_name” ===“详细表   信息”)。select($“ data_type”)

b.show(false)

+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|data_type                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|Table(tableName:spark_test, dbName:default, owner:null, createTime:0, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:id, type:int, comment:null), FieldSchema(name:name, type:string, comment:null)], location:hdfs://quickstart.cloudera:8020/home/cloudera/spark_test.txt, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:0, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim=,, serialization.format=,}), bucketCols:null, sortCols:null, parameters:null), partitionKeys:[], parameters:{EXTERNAL=TRUE, transient_lastDdlTime=1549272617}, viewOriginalText:null, viewExpandedText:null, tableType:EXTERNAL_TABLE)|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

我想从输出中读取dbName,owner参数并对其进行进一步处理。 (例如:dbName应该是默认值)

请提出一种合适的方法。我正在使用Spark 1.6

感谢您的帮助。.谢谢!

0 个答案:

没有答案