Question

我有一个python脚本，我将使用Pyspark执行。 python文件如下所示

#!/usr/bin/env python

from datetime import datetime
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext

conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)

df = sqlContext.sql("select id, name, start_date from testing.user_123")

hivedb='test'
table='abc_123'

# Register the Data Frame as a TempTable
df.registerTempTable('mytempTable')

# Create Table in Hive using the temptable
status = 'success'
try:
  sqlContext.sql("create table {}.`{}` as select * from mytempTable".format(hivedb,table))
except:
   status = 'fail'

sc.stop()

我得到了理想的结果。现在，当我在python中使用spark -submit执行此shell script文件时，我的状态始终为success。

我想要Python script execution as failed if the status message is fail和success if the status is success

我需要在脚本中更改以获得预期结果。

Answer 1

由于hivedb和table都是硬编码的，并且mytempTable已经存在，"create table {}. {} as select * from mytempTable"将始终成功，它只是如果它没有找到任何值，则创建一个空表。你需要一个不同的条件来检查;也许是select查询的长度？

Answer 2

只需在python脚本的末尾添加一个assert语句。如果状态变量值不是“成功”，这将使python脚本失败。

assert status == 'success', 'status should be success'

基于条件执行python脚本成功/失败

2 个答案: