使用PySpark将CSV文件转换为Parquet时出现问题:内存不足

时间:2019-05-15 09:33:09

标签: apache-spark pyspark aws-glue

尝试将CS​​V数据转换为Parquet时总是出现错误。我猜是因为PySpark试图读取所有数据,然后将其写出。我也做了重新分区,也许会使情况变得更糟?我想知道是否有解决办法?

如果有问题,我正在使用AWS Glue ...是否可以弥补PySpark批处理数据的影响,例如最多读取n GB的数据,将其转换为拼花地板并继续使用?

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from pyspark.sql.functions import date_format, to_date, to_timestamp
from pyspark.sql.types import DecimalType, IntegerType

DATETIME_FORMAT_STR = "yyyy-MM-dd'T'HH:mm:ss"
DATE_FORMAT_STR = "yyyy-MM-dd"

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

inputGDF = glueContext.create_dynamic_frame_from_options(connection_type = "s3", connection_options = {"paths": ["s3://xxxxxx"], "recurse": True}, format = "csv", format_options = {"withHeader": True})

holidaysGDF = glueContext.create_dynamic_frame.from_catalog(database = "xxxxxx", table_name = "holidays_clean")
holidaysDF = holidaysGDF.toDF()
holidaysDF.createOrReplaceTempView("holidays")

if bool(inputGDF.toDF().head(1)):
    print("Writing ...")
    df = inputGDF.toDF()

    df = df \
      .drop("createdat") \
      .drop("updatedat") \
      .withColumn("querydatetime", to_date(df["querydatetime"], DATE_FORMAT_STR)) \
      .withColumn("agent", df["agent"].cast(IntegerType())) \
      .withColumn("queryoutbounddate", to_date(df["queryoutbounddate"], DATE_FORMAT_STR)) \
      .withColumn("queryinbounddate", to_date(df["queryinbounddate"], DATE_FORMAT_STR)) \
      .withColumn("price", df["price"].cast(DecimalType(10, 2))) \
      .withColumn("outdeparture", to_timestamp(df["outdeparture"], DATETIME_FORMAT_STR)) \
      .withColumn("indeparture", to_timestamp(df["indeparture"], DATETIME_FORMAT_STR)) \
      .withColumn("querydestinationplace", df["querydestinationplace"].cast(IntegerType())) \
      .withColumn("numberoutstops", df["numberoutstops"].cast(IntegerType()))

    df.createOrReplaceTempView("flights")

    df = spark.sql("""
      SELECT
        /*+ BROADCAST(h) */
        CONCAT(f.outboundlegid, '-', f.inboundlegid, '-', f.agent) AS key,
        f.querydatetime,

        f.agent,
        f.queryoutbounddate,
        f.queryinbounddate,
        f.price,
        f.outdeparture,
        f.indeparture,
        f.querydestinationplace,
        f.numberoutstops,
        CASE WHEN type = 'HOLIDAY' AND (outdeparture BETWEEN start AND end)
          THEN true
          ELSE false
          END out_is_holiday,
        CASE WHEN type = 'LONG_WEEKENDS' AND (outdeparture BETWEEN start AND end)
          THEN true
          ELSE false
          END out_is_longweekends,
        CASE WHEN type = 'HOLIDAY' AND (indeparture BETWEEN start AND end)
          THEN true
          ELSE false
          END in_is_holiday,
        CASE WHEN type = 'LONG_WEEKENDS' AND (indeparture BETWEEN start AND end)
          THEN true
          ELSE false
          END in_is_longweekends
      FROM flights f
      CROSS JOIN holidays h
    """)

    df \
      .repartition("querydestinationplace", "querydatetime") \
      .write \
      .mode("append") \
      .partitionBy(["querydestinationplace", "querydatetime"]) \
      .parquet("s3://xxxxxx/flights-optimized")
else:
    print("Nothing to write ...")

job.commit()

我正在考虑在最坏的情况下,我将尝试利用Glue书签对数据进行批处理,从一个小数据集开始,然后逐个块地缓慢添加数据...但是我不认为这是通用的方法大数据处理?

一些即将结束的日志

19/05/14 20:08:38 INFO Executor: Running task 79.1 in stage 7.0 (TID 1608)
19/05/14 20:08:38 INFO Executor: Running task 77.1 in stage 7.0 (TID 1609)
19/05/14 20:08:38 INFO Executor: Running task 78.1 in stage 7.0 (TID 1610)
19/05/14 20:08:38 INFO MapOutputTrackerWorker: Updating epoch to 3 and clearing cache
19/05/14 20:08:38 INFO TorrentBroadcast: Started reading broadcast variable 13
19/05/14 20:08:38 INFO TransportClientFactory: Successfully created connection to ip-172-31-28-110.ap-southeast-1.compute.internal/172.31.28.110:35129 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:08:38 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 47.0 KB, free 2.8 GB)
19/05/14 20:08:38 INFO TorrentBroadcast: Reading broadcast variable 13 took 358 ms
19/05/14 20:08:38 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 117.5 KB, free 2.8 GB)
19/05/14 20:08:39 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 2, fetching them
19/05/14 20:08:39 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 2, fetching them
19/05/14 20:08:39 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 2, fetching them
19/05/14 20:08:39 INFO MapOutputTrackerWorker: Don't have map outputs for shuffle 2, fetching them
19/05/14 20:08:39 INFO MapOutputTrackerWorker: Doing the fetch; tracker endpoint = NettyRpcEndpointRef(spark://MapOutputTracker@172.31.20.117:43577)
19/05/14 20:08:39 INFO MapOutputTrackerWorker: Got the output locations
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Getting 73 non-empty blocks out of 672 blocks
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Getting 65 non-empty blocks out of 672 blocks
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Getting 81 non-empty blocks out of 672 blocks
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Getting 79 non-empty blocks out of 672 blocks
19/05/14 20:08:39 INFO TransportClientFactory: Successfully created connection to ip-172-31-27-224.ap-southeast-1.compute.internal/172.31.27.224:7337 after 2 ms (0 ms spent in bootstraps)
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 34 ms
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 26 ms
19/05/14 20:08:39 INFO TransportClientFactory: Successfully created connection to ip-172-31-18-221.ap-southeast-1.compute.internal/172.31.18.221:7337 after 39 ms (0 ms spent in bootstraps)
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 52 ms
19/05/14 20:08:39 INFO TransportClientFactory: Successfully created connection to ip-172-31-22-149.ap-southeast-1.compute.internal/172.31.22.149:7337 after 38 ms (0 ms spent in bootstraps)
19/05/14 20:08:39 INFO ShuffleBlockFetcherIterator: Started 1 remote fetches in 59 ms
19/05/14 20:08:40 INFO CodeGenerator: Code generated in 240.969437 ms
19/05/14 20:08:40 INFO CodeGenerator: Code generated in 18.718343 ms
19/05/14 20:08:40 INFO TransportClientFactory: Successfully created connection to ip-172-31-18-192.ap-southeast-1.compute.internal/172.31.18.192:7337 after 2 ms (0 ms spent in bootstraps)
19/05/14 20:08:40 INFO TransportClientFactory: Successfully created connection to ip-172-31-20-220.ap-southeast-1.compute.internal/172.31.20.220:7337 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:08:40 INFO TransportClientFactory: Successfully created connection to ip-172-31-28-110.ap-southeast-1.compute.internal/172.31.28.110:7337 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:08:42 INFO TransportClientFactory: Successfully created connection to ip-172-31-20-239.ap-southeast-1.compute.internal/172.31.20.239:7337 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:08:43 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (0 time so far)
19/05/14 20:08:43 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (0 time so far)
19/05/14 20:08:43 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (0 time so far)
19/05/14 20:08:45 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (0 time so far)
19/05/14 20:08:45 INFO TransportClientFactory: Successfully created connection to ip-172-31-18-252.ap-southeast-1.compute.internal/172.31.18.252:7337 after 2 ms (0 ms spent in bootstraps)
19/05/14 20:08:45 INFO TransportClientFactory: Successfully created connection to ip-172-31-20-195.ap-southeast-1.compute.internal/172.31.20.195:7337 after 7 ms (0 ms spent in bootstraps)
19/05/14 20:08:46 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (1 time so far)
19/05/14 20:08:46 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (1 time so far)
19/05/14 20:08:46 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (1 time so far)
19/05/14 20:08:46 INFO TransportClientFactory: Successfully created connection to ip-172-31-27-71.ap-southeast-1.compute.internal/172.31.27.71:7337 after 2 ms (0 ms spent in bootstraps)
19/05/14 20:08:47 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (1 time so far)
19/05/14 20:08:49 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (2 times so far)
19/05/14 20:08:49 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (2 times so far)
19/05/14 20:08:49 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (2 times so far)
19/05/14 20:08:50 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (2 times so far)
19/05/14 20:08:51 INFO TransportClientFactory: Successfully created connection to ip-172-31-27-223.ap-southeast-1.compute.internal/172.31.27.223:7337 after 0 ms (0 ms spent in bootstraps)
19/05/14 20:08:51 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (3 times so far)
19/05/14 20:08:51 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (3 times so far)
19/05/14 20:08:51 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (3 times so far)
19/05/14 20:08:52 INFO TransportClientFactory: Successfully created connection to ip-172-31-23-101.ap-southeast-1.compute.internal/172.31.23.101:7337 after 15 ms (0 ms spent in bootstraps)
19/05/14 20:08:52 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (3 times so far)
19/05/14 20:08:52 INFO TransportClientFactory: Successfully created connection to ip-172-31-28-168.ap-southeast-1.compute.internal/172.31.28.168:7337 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:08:53 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (4 times so far)
19/05/14 20:08:53 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (4 times so far)
19/05/14 20:08:53 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 704.0 MB to disk (4 times so far)
19/05/14 20:08:54 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (4 times so far)
19/05/14 20:08:55 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (5 times so far)
19/05/14 20:08:55 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (5 times so far)
19/05/14 20:08:56 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 704.0 MB to disk (5 times so far)
19/05/14 20:08:57 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (5 times so far)
19/05/14 20:08:58 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (6 times so far)
19/05/14 20:08:58 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (6 times so far)
19/05/14 20:08:59 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 704.0 MB to disk (6 times so far)
19/05/14 20:08:59 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (6 times so far)
19/05/14 20:09:00 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (7 times so far)
19/05/14 20:09:01 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to
disk (7 times so far)
19/05/14 20:09:01 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (7 times so far)
19/05/14 20:09:01 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (7 times so far)
19/05/14 20:09:02 INFO TransportClientFactory: Successfully created connection to ip-172-31-21-133.ap-southeast-1.compute.internal/172.31.21.133:7337 after 0 ms (0 ms spent in bootstraps)
19/05/14 20:09:03 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (8 times so far)
19/05/14 20:09:03 INFO TransportClientFactory: Successfully created connection to ip-172-31-30-63.ap-southeast-1.compute.internal/172.31.30.63:7337 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:09:04 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 704.0 MB to disk (8 times so far)
19/05/14 20:09:04 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 704.0 MB to disk (8 times so far)
19/05/14 20:09:05 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (8 times so far)
19/05/14 20:09:05 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (9 times so far)
19/05/14 20:09:06 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (9 times so far)
19/05/14 20:09:06 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 704.0 MB to disk (9 times so far)
19/05/14 20:09:07 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (9 times so far)
19/05/14 20:09:07 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (10 times so far)
19/05/14 20:09:08 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (10 times so far)
19/05/14 20:09:09 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (10 times so far)
19/05/14 20:09:09 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (10 times so far)
19/05/14 20:09:10 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (11 times so far)
19/05/14 20:09:10 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (11 times so far)
19/05/14 20:09:11 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (11 times so far)
19/05/14 20:09:11 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (11 times so far)
19/05/14 20:09:12 INFO TransportClientFactory: Successfully created connection to ip-172-31-18-91.ap-southeast-1.compute.internal/172.31.18.91:7337 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:09:12 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (12 times so far)
19/05/14 20:09:13 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (12 times so far)
19/05/14 20:09:13 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (12 times so far)
19/05/14 20:09:13 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (12 times so far)
19/05/14 20:09:15 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (13 times so far)
19/05/14 20:09:15 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (13 times so far)
19/05/14 20:09:15 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (13 times so far)
19/05/14 20:09:16 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (13 times so far)
19/05/14 20:09:17 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (14 times so far)
19/05/14 20:09:17 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (14 times so far)
19/05/14 20:09:18 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (14 times so far)
19/05/14 20:09:18 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (14 times so far)
19/05/14 20:09:19 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (15 times so far)
19/05/14 20:09:19 IN
FO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (15 times so far)
19/05/14 20:09:20 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (15 times so far)
19/05/14 20:09:20 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (15 times so far)
19/05/14 20:09:21 INFO TransportClientFactory: Successfully created connection to ip-172-31-22-100.ap-southeast-1.compute.internal/172.31.22.100:7337 after 1 ms (0 ms spent in bootstraps)
19/05/14 20:09:21 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (16 times so far)
19/05/14 20:09:22 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (16 times so far)
19/05/14 20:09:22 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (16 times so far)
19/05/14 20:09:22 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (16 times so far)
19/05/14 20:09:23 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (17 times so far)
19/05/14 20:09:24 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (17 times so far)
19/05/14 20:09:24 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (17 times so far)
19/05/14 20:09:24 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (17 times so far)
19/05/14 20:09:25 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (18 times so far)
19/05/14 20:09:26 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (18 times so far)
19/05/14 20:09:26 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (18 times so far)
19/05/14 20:09:27 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (18 times so far)
19/05/14 20:09:28 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (19 times so far)
19/05/14 20:09:28 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (19 times so far)
19/05/14 20:09:28 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (19 times so far)
19/05/14 20:09:29 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (19 times so far)
19/05/14 20:09:30 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (20 times so far)
19/05/14 20:09:30 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (20 times so far)
19/05/14 20:09:31 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (20 times so far)
19/05/14 20:09:31 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (20 times so far)
19/05/14 20:09:31 INFO TransportClientFactory: Successfully created connection to ip-172-31-17-107.ap-southeast-1.compute.internal/172.31.17.107:7337 after 0 ms (0 ms spent in bootstraps)
19/05/14 20:09:32 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (21 times so far)
19/05/14 20:09:32 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (21 times so far)
19/05/14 20:09:33 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (21 times so far)
19/05/14 20:09:33 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (21 times so far)
19/05/14 20:09:34 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (22 times so far)
19/05/14 20:09:35 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (22 times so far)
19/05/14 20:09:35 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (22 times so far)
19/05/14 20:09:35 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (22 times so far)
19/05/14 20:09:37 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (23 times so far)
19/05/14 20:09:37 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (23 times so far)
19/05/14 20
:09:38 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (23 times so far)
19/05/14 20:09:38 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (23 times so far)
19/05/14 20:09:40 INFO TransportClientFactory: Successfully created connection to ip-172-31-31-77.ap-southeast-1.compute.internal/172.31.31.77:7337 after 2 ms (0 ms spent in bootstraps)
19/05/14 20:09:40 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (24 times so far)
19/05/14 20:09:40 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (24 times so far)
19/05/14 20:09:40 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (24 times so far)
19/05/14 20:09:41 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (24 times so far)
19/05/14 20:09:43 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (25 times so far)
19/05/14 20:09:44 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (25 times so far)
19/05/14 20:09:45 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (25 times so far)
19/05/14 20:09:45 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (25 times so far)
19/05/14 20:09:48 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (26 times so far)
19/05/14 20:09:48 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (26 times so far)
19/05/14 20:09:48 INFO TransportClientFactory: Successfully created connection to ip-172-31-28-204.ap-southeast-1.compute.internal/172.31.28.204:7337 after 0 ms (0 ms spent in bootstraps)
19/05/14 20:09:48 INFO TransportClientFactory: Successfully created connection to ip-172-31-25-246.ap-southeast-1.compute.internal/172.31.25.246:7337 after 0 ms (0 ms spent in bootstraps)
19/05/14 20:09:49 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (26 times so far)
19/05/14 20:09:49 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (26 times so far)
19/05/14 20:09:51 INFO UnsafeExternalSorter: Thread 52 spilling sort data of 672.0 MB to disk (27 times so far)
19/05/14 20:09:52 INFO UnsafeExternalSorter: Thread 55 spilling sort data of 672.0 MB to disk (27 times so far)
19/05/14 20:09:52 INFO UnsafeExternalSorter: Thread 54 spilling sort data of 672.0 MB to disk (27 times so far)
19/05/14 20:09:53 INFO UnsafeExternalSorter: Thread 53 spilling sort data of 672.0 MB to disk (27 times so far)
19/05/14 20:09:54 INFO Executor: Executor is trying to kill task 79.1 in stage 7.0 (TID 1608), reason: stage cancelled
19/05/14 20:09:54 INFO Executor: Executor is trying to kill task 78.1 in stage 7.0 (TID 1610), reason: stage cancelled
19/05/14 20:09:54 INFO Executor: Executor is trying to kill task 76.1 in stage 7.0 (TID 1611), reason: stage cancelled
19/05/14 20:09:54 INFO Executor: Executor is trying to kill task 77.1 in stage 7.0 (TID 1609), reason: stage cancelled
19/05/14 20:09:54 INFO Executor: Executor killed task 77.1 in stage 7.0 (TID 1609), reason: stage cancelled
19/05/14 20:09:54 INFO Executor: Executor killed task 78.1 in stage 7.0 (TID 1610), reason: stage cancelled
19/05/14 20:09:54 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
19/05/14 20:09:55 INFO CoarseGrainedExecutorBackend: Driver from 172.31.20.117:43577 disconnected during shutdown
19/05/14 20:09:55 INFO CoarseGrainedExecutorBackend: Driver from 172.31.20.117:43577 disconnected during shutdown
19/05/14 20:09:55 INFO GlueCloudwatchSink: CloudwatchSink: SparkContext stopped - not reporting metrics now.
19/05/14 20:09:55 ERROR TransportResponseHandler: Still have 2 requests outstanding when connection from ip-172-31-31-77.ap-southeast-1.compute.internal/172.31.31.77:7337 is closed
19/05/14 20:09:55 INFO RetryingBlockFetcher: Retrying fetch (1/3) for 2 outstanding blocks after 5000 ms
19/05/14 20:09:55 INFO Executor: Executor killed task 79.1 in stage 7.0 (TID 1608), reason: stage cancelled
19/05/1
4 20:09:55 INFO TransportClientFactory: Found inactive connection to /172.31.20.117:43577, creating a new one.
19/05/14 20:09:55 INFO MemoryStore: MemoryStore cleared
19/05/14 20:09:55 INFO BlockManager: BlockManager stopped
19/05/14 20:09:55 WARN OneWayOutboxMessage: Failed to send one-way RPC.
java.io.IOException: Failed to connect to /172.31.20.117:43577
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:232)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:182)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: /172.31.20.117:43577
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:257)
at io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:291)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:631)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:131)
at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:144)
... 1 more
19/05/14 20:09:55 INFO ShutdownHookManager: Shutdown hook called
End of LogType:stderr

enter image description here

0 个答案:

没有答案