读取并循环访问S3文件夹

时间:2016-05-12 18:25:53

标签: python amazon-web-services amazon-s3

我正在使用mapper和reducer python脚本,在我的reducer中,我想通过该文件夹读取S3 bucket循环上的目录,并收集文本文件中的所有数据并将它们放入列表中。

之后我会将来自stdin的字符串与我在该列表中的字符串进行比较。

这是我所做的,虽然它在本地工作但它不适用于亚马逊EMR

#!/usr/bin/env python
import sys, os, re
# from pprint import pprint
# from glob import glob
import urllib2
import xml.etree.ElementTree as ET

dictionary = {}
parts_list = []
# MERGEDISTANCE  = 100
MERGEDISTANCE = int(sys.argv[1])
# index_file =  open('index.txt', 'r')

f = urllib2.urlopen("https://s3.amazonaws.com/source123")

tree = ET.parse(f)
root = tree.getroot()

for child in root.findall('{http://s3.amazonaws.com/doc/2006-03-01/}Contents'):
    for key in child.findall("{http://s3.amazonaws.com/doc/2006-03-01/}Key"):
        if key.text.startswith("output/part-"):
            key = key.text.replace("output/", "")
            parts_list.append(key)

def buildindex():
    j = 0
    while j < len(parts_list):
        f = urllib2.urlopen("https://s3.amazonaws.com/source123/output/" + parts_list[j])
        for line in f.readlines():
            line = line.rstrip()
            yield line
        j += 1
        f.close()


for suspicious_line in sys.stdin:
 .....

系统日志

2016-05-12 18:04:37,323 INFO com.amazon.ws.emr.hadoop.fs.EmrFileSystem (main): Consistency disabled, using com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem as filesystem implementation
2016-05-12 18:04:37,559 INFO amazon.emr.metrics.MetricsSaver (main): MetricsConfigRecord disabledInCluster: false instanceEngineCycleSec: 60 clusterEngineCycleSec: 60 disableClusterEngine: true maxMemoryMb: 3072 maxInstanceCount: 500 lastModified: 1463075998666 
2016-05-12 18:04:37,559 INFO amazon.emr.metrics.MetricsSaver (main): Created MetricsSaver j-1CJMJ1D4HZ0N7:i-236379fb:RunJar:10872 period:60 /mnt/var/em/raw/i-236379fb_20160512_RunJar_10872_raw.bin
2016-05-12 18:04:39,518 INFO org.apache.hadoop.yarn.client.RMProxy (main): Connecting to ResourceManager at ip-172-31-47-61.us-west-2.compute.internal/172.31.47.61:8032
2016-05-12 18:04:39,703 INFO org.apache.hadoop.yarn.client.RMProxy (main): Connecting to ResourceManager at ip-172-31-47-61.us-west-2.compute.internal/172.31.47.61:8032
2016-05-12 18:04:40,199 INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem (main): Opening 's3://source123/mapper.py' for reading
2016-05-12 18:04:40,335 INFO amazon.emr.metrics.MetricsSaver (main): Thread 1 created MetricsLockFreeSaver 1
2016-05-12 18:04:40,522 INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem (main): Opening 's3://source123/suspicious_reducer.py' for reading
2016-05-12 18:04:40,727 INFO com.hadoop.compression.lzo.GPLNativeCodeLoader (main): Loaded native gpl library
2016-05-12 18:04:40,729 INFO com.hadoop.compression.lzo.LzoCodec (main): Successfully loaded & initialized native-lzo library [hadoop-lzo rev 426d94a07125cf9447bb0c2b336cf10b4c254375]
2016-05-12 18:04:41,440 INFO com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem (main): listStatus s3://source123/suspicious-input with recursive false
2016-05-12 18:04:41,652 INFO org.apache.hadoop.mapred.FileInputFormat (main): Total input paths to process : 21
2016-05-12 18:04:41,742 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): number of splits:83
2016-05-12 18:04:41,994 INFO org.apache.hadoop.mapreduce.JobSubmitter (main): Submitting tokens for job: job_1463075989014_0001
2016-05-12 18:04:42,295 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl (main): Submitted application application_1463075989014_0001
2016-05-12 18:04:42,398 INFO org.apache.hadoop.mapreduce.Job (main): The url to track the job: http://ip-172-31-47-61.us-west-2.compute.internal:20888/proxy/application_1463075989014_0001/
2016-05-12 18:04:42,401 INFO org.apache.hadoop.mapreduce.Job (main): Running job: job_1463075989014_0001
2016-05-12 18:04:50,507 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1463075989014_0001 running in uber mode : false
2016-05-12 18:04:50,508 INFO org.apache.hadoop.mapreduce.Job (main):  map 0% reduce 0%
2016-05-12 18:05:09,683 INFO org.apache.hadoop.mapreduce.Job (main):  map 1% reduce 0%
2016-05-12 18:05:12,720 INFO org.apache.hadoop.mapreduce.Job (main):  map 2% reduce 0%
2016-05-12 18:05:13,728 INFO org.apache.hadoop.mapreduce.Job (main):  map 4% reduce 0%
2016-05-12 18:05:15,746 INFO org.apache.hadoop.mapreduce.Job (main):  map 5% reduce 0%
2016-05-12 18:05:16,760 INFO org.apache.hadoop.mapreduce.Job (main):  map 11% reduce 0%
2016-05-12 18:05:17,769 INFO org.apache.hadoop.mapreduce.Job (main):  map 20% reduce 0%
2016-05-12 18:05:18,777 INFO org.apache.hadoop.mapreduce.Job (main):  map 30% reduce 0%
2016-05-12 18:05:19,783 INFO org.apache.hadoop.mapreduce.Job (main):  map 36% reduce 0%
2016-05-12 18:05:20,790 INFO org.apache.hadoop.mapreduce.Job (main):  map 40% reduce 0%
2016-05-12 18:05:21,796 INFO org.apache.hadoop.mapreduce.Job (main):  map 46% reduce 0%
2016-05-12 18:05:22,804 INFO org.apache.hadoop.mapreduce.Job (main):  map 55% reduce 0%
2016-05-12 18:05:23,826 INFO org.apache.hadoop.mapreduce.Job (main):  map 76% reduce 0%
2016-05-12 18:05:24,835 INFO org.apache.hadoop.mapreduce.Job (main):  map 83% reduce 0%
2016-05-12 18:05:25,842 INFO org.apache.hadoop.mapreduce.Job (main):  map 84% reduce 0%
2016-05-12 18:05:27,854 INFO org.apache.hadoop.mapreduce.Job (main):  map 86% reduce 0%
2016-05-12 18:05:30,872 INFO org.apache.hadoop.mapreduce.Job (main):  map 87% reduce 0%
2016-05-12 18:05:31,878 INFO org.apache.hadoop.mapreduce.Job (main):  map 89% reduce 0%
2016-05-12 18:05:32,884 INFO org.apache.hadoop.mapreduce.Job (main):  map 92% reduce 0%
2016-05-12 18:05:33,891 INFO org.apache.hadoop.mapreduce.Job (main):  map 96% reduce 0%
2016-05-12 18:05:35,903 INFO org.apache.hadoop.mapreduce.Job (main):  map 99% reduce 4%
2016-05-12 18:05:36,908 INFO org.apache.hadoop.mapreduce.Job (main):  map 99% reduce 5%
2016-05-12 18:05:37,915 INFO org.apache.hadoop.mapreduce.Job (main):  map 100% reduce 9%
2016-05-12 18:05:37,921 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000008_0, Status : FAILED
2016-05-12 18:05:37,954 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000021_0, Status : FAILED
2016-05-12 18:05:37,962 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000025_0, Status : FAILED
2016-05-12 18:05:37,963 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000019_0, Status : FAILED
2016-05-12 18:05:37,964 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000022_0, Status : FAILED
2016-05-12 18:05:37,966 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000006_0, Status : FAILED
2016-05-12 18:05:37,967 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000007_0, Status : FAILED
2016-05-12 18:05:37,969 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000014_0, Status : FAILED
2016-05-12 18:05:37,971 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000015_0, Status : FAILED
2016-05-12 18:05:37,972 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000016_0, Status : FAILED
2016-05-12 18:05:38,979 INFO org.apache.hadoop.mapreduce.Job (main):  map 100% reduce 7%
2016-05-12 18:05:38,983 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000011_0, Status : FAILED
2016-05-12 18:05:38,986 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000005_0, Status : FAILED
2016-05-12 18:05:38,987 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000020_0, Status : FAILED
2016-05-12 18:05:38,989 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000017_0, Status : FAILED
2016-05-12 18:05:38,991 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000024_0, Status : FAILED
2016-05-12 18:05:38,992 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000003_0, Status : FAILED
2016-05-12 18:05:39,003 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000012_0, Status : FAILED
2016-05-12 18:05:39,004 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000010_0, Status : FAILED
2016-05-12 18:05:39,006 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000018_0, Status : FAILED
2016-05-12 18:05:39,008 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000009_0, Status : FAILED
2016-05-12 18:05:40,024 INFO org.apache.hadoop.mapreduce.Job (main):  map 100% reduce 0%
2016-05-12 18:05:40,028 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000028_0, Status : FAILED
2016-05-12 18:05:40,032 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000023_0, Status : FAILED
2016-05-12 18:05:40,034 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000001_0, Status : FAILED
2016-05-12 18:05:40,037 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000004_0, Status : FAILED
2016-05-12 18:05:40,038 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000000_0, Status : FAILED
2016-05-12 18:05:40,040 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000002_0, Status : FAILED
2016-05-12 18:05:40,043 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000026_0, Status : FAILED
2016-05-12 18:05:40,046 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000013_0, Status : FAILED
2016-05-12 18:05:41,055 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000029_0, Status : FAILED
2016-05-12 18:05:42,063 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000027_0, Status : FAILED
2016-05-12 18:05:42,070 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000030_0, Status : FAILED
2016-05-12 18:05:44,083 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000031_0, Status : FAILED
2016-05-12 18:05:44,084 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000032_0, Status : FAILED
2016-05-12 18:05:45,091 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000033_0, Status : FAILED
2016-05-12 18:05:47,104 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000034_0, Status : FAILED
2016-05-12 18:05:51,128 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000005_1, Status : FAILED
2016-05-12 18:05:51,130 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000020_1, Status : FAILED
2016-05-12 18:05:51,131 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000010_1, Status : FAILED
2016-05-12 18:05:51,133 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000011_1, Status : FAILED
2016-05-12 18:05:51,134 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000017_1, Status : FAILED
2016-05-12 18:05:51,135 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000003_1, Status : FAILED
2016-05-12 18:05:52,142 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000012_1, Status : FAILED
2016-05-12 18:05:52,144 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000016_1, Status : FAILED
2016-05-12 18:05:52,147 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000024_1, Status : FAILED
2016-05-12 18:05:53,154 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000021_1, Status : FAILED
2016-05-12 18:05:53,156 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000025_1, Status : FAILED
2016-05-12 18:05:53,158 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000022_1, Status : FAILED
2016-05-12 18:05:53,159 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000019_1, Status : FAILED
2016-05-12 18:05:53,160 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000007_1, Status : FAILED
2016-05-12 18:05:53,162 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000009_1, Status : FAILED
2016-05-12 18:05:53,163 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000015_1, Status : FAILED
2016-05-12 18:05:53,164 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000014_1, Status : FAILED
2016-05-12 18:05:53,165 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000001_1, Status : FAILED
2016-05-12 18:05:53,167 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000006_1, Status : FAILED
2016-05-12 18:05:54,184 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000002_1, Status : FAILED
2016-05-12 18:05:54,186 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000023_1, Status : FAILED
2016-05-12 18:05:54,188 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000008_1, Status : FAILED
2016-05-12 18:05:54,199 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000004_1, Status : FAILED
2016-05-12 18:05:54,200 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000000_1, Status : FAILED
2016-05-12 18:05:54,202 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000013_1, Status : FAILED
2016-05-12 18:05:54,203 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000026_1, Status : FAILED
2016-05-12 18:05:54,205 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000028_1, Status : FAILED
2016-05-12 18:05:54,213 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000029_1, Status : FAILED
2016-05-12 18:05:55,222 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000030_1, Status : FAILED
2016-05-12 18:05:55,224 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000018_1, Status : FAILED
2016-05-12 18:05:55,225 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000027_1, Status : FAILED
2016-05-12 18:05:55,226 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000032_1, Status : FAILED
2016-05-12 18:05:57,237 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000033_1, Status : FAILED
2016-05-12 18:05:57,239 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000031_1, Status : FAILED
2016-05-12 18:05:58,245 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000034_1, Status : FAILED
2016-05-12 18:06:03,271 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000003_2, Status : FAILED
2016-05-12 18:06:03,272 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000020_2, Status : FAILED
2016-05-12 18:06:04,279 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000015_2, Status : FAILED
2016-05-12 18:06:04,280 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000005_2, Status : FAILED
2016-05-12 18:06:05,287 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000010_2, Status : FAILED
2016-05-12 18:06:05,288 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000011_2, Status : FAILED
2016-05-12 18:06:05,289 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000017_2, Status : FAILED
2016-05-12 18:06:05,290 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000018_2, Status : FAILED
2016-05-12 18:06:05,292 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000009_2, Status : FAILED
2016-05-12 18:06:05,293 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000016_2, Status : FAILED
2016-05-12 18:06:05,294 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000012_2, Status : FAILED
2016-05-12 18:06:05,295 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000001_2, Status : FAILED
2016-05-12 18:06:06,302 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000007_2, Status : FAILED
2016-05-12 18:06:06,303 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000006_2, Status : FAILED
2016-05-12 18:06:06,304 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000024_2, Status : FAILED
2016-05-12 18:06:06,306 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000026_2, Status : FAILED
2016-05-12 18:06:06,307 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000014_2, Status : FAILED
2016-05-12 18:06:07,314 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000022_2, Status : FAILED
2016-05-12 18:06:07,316 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000028_2, Status : FAILED
2016-05-12 18:06:07,319 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000019_2, Status : FAILED
2016-05-12 18:06:07,320 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000025_2, Status : FAILED
2016-05-12 18:06:08,325 INFO org.apache.hadoop.mapreduce.Job (main):  map 100% reduce 3%
2016-05-12 18:06:08,329 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000021_2, Status : FAILED
2016-05-12 18:06:08,331 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000032_2, Status : FAILED
2016-05-12 18:06:08,332 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000013_2, Status : FAILED
2016-05-12 18:06:08,333 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000029_2, Status : FAILED
2016-05-12 18:06:08,335 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000002_2, Status : FAILED
2016-05-12 18:06:08,336 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000000_2, Status : FAILED
2016-05-12 18:06:08,337 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000027_2, Status : FAILED
2016-05-12 18:06:08,338 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000004_2, Status : FAILED
2016-05-12 18:06:08,340 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000030_2, Status : FAILED
2016-05-12 18:06:08,341 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000008_2, Status : FAILED
2016-05-12 18:06:09,346 INFO org.apache.hadoop.mapreduce.Job (main):  map 100% reduce 0%
2016-05-12 18:06:09,348 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000023_2, Status : FAILED
2016-05-12 18:06:09,349 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000034_2, Status : FAILED
2016-05-12 18:06:09,351 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000031_2, Status : FAILED
2016-05-12 18:06:10,357 INFO org.apache.hadoop.mapreduce.Job (main): Task Id : attempt_1463075989014_0001_r_000033_2, Status : FAILED
2016-05-12 18:06:16,393 INFO org.apache.hadoop.mapreduce.Job (main):  map 100% reduce 100%
2016-05-12 18:06:16,400 INFO org.apache.hadoop.mapreduce.Job (main): Job job_1463075989014_0001 failed with state FAILED due to: Task failed task_1463075989014_0001_r_000020
Job failed as tasks failed. failedMaps:0 failedReduces:1

2016-05-12 18:06:16,531 INFO org.apache.hadoop.mapreduce.Job (main): Counters: 44
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=12207382
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=9296
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=83
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
        S3: Number of bytes read=2432462
        S3: Number of bytes written=0
        S3: Number of read operations=0
        S3: Number of large read operations=0
        S3: Number of write operations=0
    Job Counters 
        Failed reduce tasks=106
        Killed map tasks=1
        Killed reduce tasks=34
        Launched map tasks=83
        Launched reduce tasks=140
        Data-local map tasks=83
        Total time spent by all maps in occupied slots (ms)=81896490
        Total time spent by all reduces in occupied slots (ms)=133401060
        Total time spent by all map tasks (ms)=1819922
        Total time spent by all reduce tasks (ms)=1482234
        Total vcore-milliseconds taken by all map tasks=1819922
        Total vcore-milliseconds taken by all reduce tasks=1482234
        Total megabyte-milliseconds taken by all map tasks=2620687680
        Total megabyte-milliseconds taken by all reduce tasks=4268833920
    Map-Reduce Framework
        Map input records=33221
        Map output records=62063
        Map output bytes=5121017
        Map output materialized bytes=1515128
        Input split bytes=9296
        Combine input records=0
        Spilled Records=62063
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=21830
        CPU time spent (ms)=104080
        Physical memory (bytes) snapshot=40538492928
        Virtual memory (bytes) snapshot=170405302272
        Total committed heap usage (bytes)=46246920192
    File Input Format Counters 
        Bytes Read=2432462
2016-05-12 18:06:16,531 ERROR org.apache.hadoop.streaming.StreamJob (main): Job not successful!

stderr log的一部分

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:332)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:244)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:484)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:172)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

欢迎任何线索

0 个答案:

没有答案