在Python脚本中,被ApplicationMaster退出代码杀死的容器为143

时间:2019-07-29 08:47:28

标签: hadoop hive pyspark oozie

使用oozie并通过外壳启动pyspark脚本时出现以下错误。

Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

在脚本中,我读取并在Hive表中添加一条记录。

我试图设置更多的内存,但是它不起作用。甚至,我也不相信这是内存,因为我只是在Hive表中添加了一条记录!

    spark_conf = SparkConf().setAppName("KB").setAll(settings)
    spark = SparkSession.builder. \
    config(conf = spark_conf). \
    enableHiveSupport(). \
    getOrCreate()


    id=sys.argv[1]     
    importdate = datetime.now().strftime("%Y%m%d")
    query1 = "SELECT * FROM knowledge_blocks.ingestion WHERE id = "+id+""

    ingested = spark.sql(query1)
    title = ingested.select('title').collect()[0]['title']
    content = ingested.select('content').collect()[0]['content']

   ......

    query2 = "INSERT INTO knowledge_blocks.preprocessed PARTITION 
    (creation_date='"+importdate+"') VALUES ('"+str(id)+"', 
    '"+tokens_title+"', 
    '"+tokens_content+"', '"+tokens_content_soft+"')"

    rows = zip([id],[tokens_title],[tokens_content],[tokens_content_soft], 
    [importdate])

    df_spark = spark.createDataFrame(rows, schema=['id','tokens_title','tokens_content','tokens_content_soft','creation_date'])
    df_spark.show()
    df_spark.write.saveAsTable('knowledge_blocks.preprocessed', format='hive', mode='append', partitionBy='creation_date')

     spark.stop()

0 个答案:

没有答案