我有一个Spark
工作,可以从几个Redshift
表中检索数据,应用一些转换,例如join
和groupby
,并应用一些UDFs
一些专栏。
我已经在我的本地机器中独立执行了Spark,它运行正常。但是当我在aws
群集中执行时,它会卡在udf
上,因为我尝试删除udf
并且它有效。
与此相关我找到this
我需要使用UDFs
,但如果我在该任务中使用了2到3个小时的工作而且spark job
完成且没有错误,则只会停止。
任何人都有类似的东西吗?任何帮助将不胜感激
修改
当我删除UDFs
工作正常时。
但UDFs
它仍然存在少量任务,这里是日志的结尾:
stdout log:
2017-06-07T09:18:01.929+0000: [GC (Allocation Failure) 2017-06-07T09:18:01.929+0000: [ParNew: 66492K->2341K(72512K),
0.0024644 secs] 648682K->584531K(1042416K), 0.0025210 secs] [Times: user=0.01 sys=0.00, real=0.00 secs] 2017-06-07T09:18:01.962+0000: [GC (Allocation Failure) 2017-06-07T09:18:01.962+0000: [ParNew: 66758K->2487K(72512K), 0.0022863 secs] 648948K->584677K(1042416K),
0.0023321 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2017-06-07T09:18:02.001+0000: [GC (Allocation Failure) 2017-06-07T09:18:02.001+0000: [ParNew: 66999K->3757K(72512K),
0.0028101 secs] 649189K->585953K(1042416K), 0.0028601 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2017-06-07T09:18:02.030+0000: [GC (Allocation Failure) 2017-06-07T09:18:02.030+0000: [ParNew: 68269K->2462K(72512K), 0.0019834 secs] 650465K->584706K(1042416K),
0.0020289 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 2017-06-07T09:18:02.130+0000: [GC (Allocation Failure) 2017-06-07T09:18:02.130+0000: [ParNew: 66974K->6797K(72512K),
0.0038833 secs] 649218K->589043K(1042416K), 0.0039409 secs] [Times: user=0.03 sys=0.00, real=0.01 secs] 2017-06-07T09:18:02.309+0000: [GC (Allocation Failure) 2017-06-07T09:18:02.309+0000: [ParNew: 71311K->8000K(72512K), 0.0209973 secs] 653556K->595016K(1042416K),
0.0210531 secs] [Times: user=0.10 sys=0.00, real=0.02 secs] 2017-06-07T09:18:02.331+0000: [GC (GCLocker Initiated GC) 2017-06-07T09:18:02.331+0000: [ParNew: 8632K->3373K(72512K), 0.0131140 secs] 595648K->595234K(1042416K), 0.0131557 secs] [Times: user=0.08 sys=0.00, real=0.02 secs] 2017-06-07T09:22:28.879+0000: [GC (Allocation Failure) 2017-06-07T09:22:28.879+0000: [ParNew: 67885K->1862K(72512K), 0.0018928 secs] 659746K->593723K(1042416K),
0.0019463 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 2017-06-07T09:27:48.879+0000: [GC (Allocation Failure) 2017-06-07T09:27:48.879+0000: [ParNew: 66374K->1231K(72512K),
0.0014260 secs] 658235K->593093K(1042416K), 0.0014730 secs] [Times: user=0.02 sys=0.00, real=0.00 secs] 2017-06-07T09:33:08.879+0000: [GC (Allocation Failure) 2017-06-07T09:33:08.879+0000: [ParNew: 65743K->1075K(72512K), 0.0016924 secs] 657605K->592937K(1042416K),
0.0017409 secs] [Times: user=0.01 sys=0.00, real=0.01 secs]
stderr log:
17/06/07 09:18:02 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/06/07 09:18:02 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1200 blocks
17/06/07 09:18:02 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/06/07 09:18:02 INFO ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks
17/06/07 09:18:02 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/06/07 09:18:02 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks out of 1200 blocks
17/06/07 09:18:02 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms
17/06/07 09:18:02 INFO CodeGenerator: Code generated in 29.410073 ms
17/06/07 09:18:02 INFO CodeGenerator: Code generated in 8.06304 ms
17/06/07 09:18:02 INFO CodeGenerator: Code generated in 12.481201 ms
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_10 stored as values in memory (estimated size 928.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_9 stored as values in memory (estimated size 928.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_3 stored as values in memory (estimated size 904.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_15 stored as values in memory (estimated size 904.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_13 stored as values in memory (estimated size 888.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_5 stored as values in memory (estimated size 904.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_12 stored as values in memory (estimated size 904.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO MemoryStore: Block rdd_368_8 stored as values in memory (estimated size 928.0 B, free 5.4 GB)
17/06/07 09:18:02 INFO CodeGenerator: Code generated in 17.574289 ms
17/06/07 09:18:02 INFO CodeGenerator: Code generated in 8.639658 ms
Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']
Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']
Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']
Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']
Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']
Could not find valid SPARK_HOME while searching ['/mnt/yarn/usercache/hadoop/appcache/application_1496824216933_0005', '/mnt/yarn/usercache/hadoop/filecache/157/pyspark.zip/pyspark']
为什么使用UDFs
消息Could not find valid SPARK_HOME while searching
... ??