我有一个R脚本在R Colsole中运行得非常好,但是当我在Hadoop流中运行时,它在Map阶段失败并出现以下错误。找到任务尝试日志
我有Hadoop流媒体命令:
/home/Bibhu/hadoop-0.20.2/bin/hadoop jar \
/home/Bibhu/hadoop-0.20.2/contrib/streaming/*.jar \
-input hdfs://localhost:54310/user/Bibhu/BookTE1.csv \
-output outsid -mapper `pwd`/code1.sh
stderr logs
Loading required package: class
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
no lines available in input
Calls: read.csv -> read.table
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
系统日志日志
2013-07-03 19:32:36,080 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=MAP, sessionId=
2013-07-03 19:32:36,654 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 1
2013-07-03 19:32:36,675 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 100
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: data buffer = 79691776/99614720
2013-07-03 19:32:36,835 INFO org.apache.hadoop.mapred.MapTask: record buffer = 262144/327680
2013-07-03 19:32:36,899 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed exec [/home/Bibhu/Downloads/SentimentAnalysis/Sid/smallFile/code1.sh]
2013-07-03 19:32:37,256 INFO org.apache.hadoop.streaming.PipeMapRed: Records R/W=0/1
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: MRErrorThread done
2013-07-03 19:32:38,509 INFO org.apache.hadoop.streaming.PipeMapRed: PipeMapRed failed!
2013-07-03 19:32:38,557 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
2013-07-03 19:32:38,631 INFO org.apache.hadoop.mapred.TaskRunner: Runnning cleanup for the task
答案 0 :(得分:0)
hadoop-streaming-1.0.4.jar
答案 1 :(得分:0)
您需要从映射器和缩减器中查找日志,因为这是作业失败的地方(如java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
所示)。这表示你的R脚本崩溃了。
如果您正在使用Hortonworks Hadoop发行版,最简单的方法是打开您的工作历史。它应该在http://127.0.0.1:19888/jobhistory
。应该可以使用命令行在文件系统中找到日志,但我还没找到。
http://127.0.0.1:19888/jobhistory
你应该看到一个类似
的页面Log Type: stderr
Log Length: 418
Traceback (most recent call last):
File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 45, in <module>
mapper()
File "/hadoop/yarn/local/usercache/root/appcache/application_1404203309115_0003/container_1404203309115_0003_01_000002/./mapper.py", line 37, in mapper
for record in reader:
_csv.Error: newline inside string
这是我的Python脚本的错误,R的错误看起来有点不同。
来源:http://hortonworks.com/community/forums/topic/map-reduce-job-log-files/
答案 2 :(得分:-2)
今晚我收到同样的错误,同时用R
开发Map Reduce Streaming作业。
我正在开发一个10节点集群,每个集群有12个核心,并尝试在提交时提供:
-D mapred.map.tasks=200\
-D mapred.reduce.tasks=200
虽然我将这些内容更改为
,但作业已成功完成-D mapred.map.tasks=10\
-D mapred.reduce.tasks=10
这是一个神秘的解决方案,今晚可能还会出现更多情况。但是,如果有任何读者可以解释,请做!