在PySpark中显示Hive查询的状态

时间:2017-05-16 18:58:59

标签: hadoop apache-spark hive pyspark

我正在从一个火花活动中运行一个Hive查询(spark

spark.sql('SELECT * FROM SOME_TABLE').show()

sql函数中是否有参数,或打印状态的配置类似于Hive cli中显示的内容?

Hadoop job information for Stage-1: number of mappers: 1193; number of reducers: 1099
2017-05-16 14:54:38,165 Stage-1 map = 0%,  reduce = 0%
2017-05-16 14:54:49,625 Stage-1 map = 1%,  reduce = 0%, Cumulative CPU 213.84 sec
2017-05-16 14:54:50,678 Stage-1 map = 13%,  reduce = 0%, Cumulative CPU 4495.91 sec
2017-05-16 14:54:51,729 Stage-1 map = 15%,  reduce = 0%, Cumulative CPU 5081.18 sec
2017-05-16 14:54:52,778 Stage-1 map = 17%,  reduce = 0%, Cumulative CPU 5244.48 sec
2017-05-16 14:54:53,818 Stage-1 map = 34%,  reduce = 0%, Cumulative CPU 7186.78 sec
2017-05-16 14:54:54,851 Stage-1 map = 46%,  reduce = 0%, Cumulative CPU 7702.71 sec
2017-05-16 14:54:55,887 Stage-1 map = 51%,  reduce = 0%, Cumulative CPU 7968.09 sec
2017-05-16 14:54:56,919 Stage-1 map = 54%,  reduce = 0%, Cumulative CPU 8325.11 sec

1 个答案:

答案 0 :(得分:0)

是的,您可以通过几种方式查看状态。

1)要查看正在运行的作业的[相当详细]状态,请将logLevel更改为“INFO”:spark.sparkContext.setLogLevel("INFO")

2)使用Spark或YARN UI(通常端口18088用于Spark或4040用于本地,8088用于YARN)

UI中的事件日志将显示您需要知道的内容,或者进度条更简单直观。

相关文档:https://spark.apache.org/docs/latest/monitoring.html