自定义胶水pyspark作业无法将数据写入s3

时间:2020-05-14 03:31:03

标签: pyspark aws-glue

我试图从s3中读取10MB的镶木地板文件,进行一些转换,然后以镶木地板格式写回s3中的另一个位置。但是,在读取文件之后,在作业运行指标图中的writebytes部分中没有看到任何活动。我在驱动程序/执行器中没有任何错误,并且作业继续运行直到超时。

#getting source data
df_data_curated = spark\
               .read\
                .format("parquet")\
                 .load(src_loc, inferSchema = True)

df_data_curated.write.mode("overwrite").parquet(stage_loc)

src_loc和stage_loc的定义正确。

这是度量标准的快照。 enter image description here enter image description here 一段时间后,我取消了job_run。 这是驱动程序的一些日志。

20/05/14 02:42:31 INFO DAGScheduler: Got job 2 (run at ThreadPoolExecutor.java:1149) with 3 output partitions
20/05/14 02:42:31 INFO DAGScheduler: Got job 3 (run at ThreadPoolExecutor.java:1149) with 3 output partitions
20/05/14 02:42:32 INFO DAGScheduler: Got job 4 (run at ThreadPoolExecutor.java:1149) with 3 output partitions
20/05/14 02:42:32 INFO DAGScheduler: Got job 5 (run at ThreadPoolExecutor.java:1149) with 3 output partitions
20/05/14 02:42:35 INFO DAGScheduler: Job 5 finished: run at ThreadPoolExecutor.java:1149, took 3.815797 s
20/05/14 02:42:35 INFO DAGScheduler: Got job 6 (run at ThreadPoolExecutor.java:1149) with 40 output partitions
20/05/14 02:42:36 INFO DAGScheduler: Job 2 finished: run at ThreadPoolExecutor.java:1149, took 4.144254 s
20/05/14 02:42:36 INFO DAGScheduler: Got job 7 (run at ThreadPoolExecutor.java:1149) with 40 output partitions
20/05/14 02:42:36 INFO DAGScheduler: Job 3 finished: run at ThreadPoolExecutor.java:1149, took 4.624045 s
20/05/14 02:42:36 INFO DAGScheduler: Got job 8 (run at ThreadPoolExecutor.java:1149) with 40 output partitions
20/05/14 02:42:36 INFO DAGScheduler: Job 4 finished: run at ThreadPoolExecutor.java:1149, took 4.894199 s
20/05/14 02:42:36 INFO DAGScheduler: Got job 9 (run at ThreadPoolExecutor.java:1149) with 40 output partitions
20/05/14 02:43:02 INFO DAGScheduler: Job 6 finished: run at ThreadPoolExecutor.java:1149, took 26.845351 s
20/05/14 02:43:02 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
20/05/14 02:43:03 INFO DAGScheduler: Job 8 finished: run at ThreadPoolExecutor.java:1149, took 26.532114 s
20/05/14 02:43:03 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
20/05/14 02:43:03 INFO DAGScheduler: Job 7 finished: run at ThreadPoolExecutor.java:1149, took 27.716202 s
20/05/14 02:43:03 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
20/05/14 02:43:04 INFO DAGScheduler: Job 9 finished: run at ThreadPoolExecutor.java:1149, took 27.183140 s
20/05/14 02:43:04 WARN SparkConf: The configuration key 'spark.yarn.executor.memoryOverhead' has been deprecated as of Spark 2.3 and may be removed in the future. Please use the new key 'spark.executor.memoryOverhead' instead.
20/05/14 02:43:06 INFO DAGScheduler: Got job 10 (parquet at NativeMethodAccessorImpl.java:0) with 2 output partitions
20/05/14 02:43:28 WARN ServletHandler: Error for /api/v1/applications/application_1589423933964_0001
20/05/14 02:43:28 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:43:28 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:43:28 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:43:28 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:43:28 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:43:28 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:43:28 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:45:29 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:45:29 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:45:29 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:45:29 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:45:29 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:45:29 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:45:29 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:45:29 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:47:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:47:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:47:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:47:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:47:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:47:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:47:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:47:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:49:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:49:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:49:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:49:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:49:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:49:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:49:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:49:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:51:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:51:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:51:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:51:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:51:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:51:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:51:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:51:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:53:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:53:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:53:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:53:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:53:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:53:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:53:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:53:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:55:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:55:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:55:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:55:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:55:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:55:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:55:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:55:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:57:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:57:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:57:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:57:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:57:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:57:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:57:30 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:57:30 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:59:31 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:59:31 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:59:31 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001
20/05/14 02:59:31 WARN HttpChannel: //ip-172-31-12-184.eu-west-3.compute.internal:37079/api/v1/applications/application_1589423933964_0001?proxyapproved=true
20/05/14 02:59:31 WARN ServletHandler: /api/v1/applications/application_1589423933964_0001

0 个答案:

没有答案