使用oozie并通过外壳启动pyspark脚本时出现以下错误。
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
在脚本中,我读取并在Hive表中添加一条记录。
我试图设置更多的内存,但是它不起作用。甚至,我也不相信这是内存,因为我只是在Hive表中添加了一条记录!
spark_conf = SparkConf().setAppName("KB").setAll(settings)
spark = SparkSession.builder. \
config(conf = spark_conf). \
enableHiveSupport(). \
getOrCreate()
id=sys.argv[1]
importdate = datetime.now().strftime("%Y%m%d")
query1 = "SELECT * FROM knowledge_blocks.ingestion WHERE id = "+id+""
ingested = spark.sql(query1)
title = ingested.select('title').collect()[0]['title']
content = ingested.select('content').collect()[0]['content']
......
query2 = "INSERT INTO knowledge_blocks.preprocessed PARTITION
(creation_date='"+importdate+"') VALUES ('"+str(id)+"',
'"+tokens_title+"',
'"+tokens_content+"', '"+tokens_content_soft+"')"
rows = zip([id],[tokens_title],[tokens_content],[tokens_content_soft],
[importdate])
df_spark = spark.createDataFrame(rows, schema=['id','tokens_title','tokens_content','tokens_content_soft','creation_date'])
df_spark.show()
df_spark.write.saveAsTable('knowledge_blocks.preprocessed', format='hive', mode='append', partitionBy='creation_date')
spark.stop()