我正在使用代码[1]使用python启动MapReduce作业。问题是我在stderr字段[3]中获得了正确的输出数据,而不是进入stdout字段[2]。为什么我在stderr字段中获取正确的数据?我正确使用Popen.communicate
吗?有没有更好的方法来使用python(而不是jython)启动java执行?
[1]我用来在Hadoop中启动作业的片段
command=/home/xubuntu/Programs/hadoop/bin/hadoop jar /home/xubuntu/Programs/hadoop/medusa-java.jar mywordcount -Dfile.path=/home/xubuntu/Programs/medusa-2.0/temp/1443004585/job.attributes /input1 /output1
try:
process = subprocess.Popen(shlex.split(command), stdout=subprocess.PIPE, stderr=subprocess.PIPE)
out,err = process.communicate()
print ("Out %s" % out)
print ("Error %s" % err)
if len(err) > 0: # there is an exception
# print("Going to launch exception")
raise ValueError("Exception:\n" + err)
except ValueError as e:
return e.message
return out
[2] stdoutdata中的输出:
[2015-09-23 07:16:13,220: WARNING/Worker-17] Out My Setup
My get job name
My get job name
My get job name
org.apache.hadoop.mapreduce.lib.partition.HashPartitioner
---> Job 0: /input1, : /output1-1443006949
10.10.5.192
10.10.5.192:8032
[3] stderrdata字段中的输出:
[2015-09-23 07:16:13,221: WARNING/Worker-17] Error 15/09/23 07:15:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/09/23 07:15:53 INFO client.RMProxy: Connecting to ResourceManager at /10.10.5.192:8032
15/09/23 07:15:54 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/23 07:15:54 INFO input.FileInputFormat: Total input paths to process : 4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: number of splits:4
15/09/23 07:15:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1442999930174_0009
15/09/23 07:15:54 INFO impl.YarnClientImpl: Submitted application application_1442999930174_0009
15/09/23 07:15:54 INFO mapreduce.Job: The url to track the job: http://hadoop-coc-1:9046/proxy/application_1442999930174_0009/
15/09/23 07:15:54 INFO mapreduce.Job: Running job: job_1442999930174_0009
15/09/23 07:16:00 INFO mapreduce.Job: Job job_1442999930174_0009 running in uber mode : false
15/09/23 07:16:00 INFO mapreduce.Job: map 0% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job: map 100% reduce 0%
15/09/23 07:16:13 INFO mapreduce.Job: Job job_1442999930174_0009 completed successfully
15/09/23 07:16:13 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=423900
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=472
HDFS: Number of bytes written=148
HDFS: Number of read operations=20
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=41232
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=41232
Total vcore-seconds taken by all map tasks=41232
Total megabyte-seconds taken by all map tasks=42221568
Map-Reduce Framework
Map input records=34
Map output records=34
Input split bytes=406
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=532
CPU time spent (ms)=1320
Physical memory (bytes) snapshot=245039104
Virtual memory (bytes) snapshot=1272741888
Total committed heap usage (bytes)=65273856
File Input Format Counters
答案 0 :(得分:1)
Hadoop(特别是Log4j
)只会将所有[INFO]
条消息记录到stderr
。从他们的entry配置:
默认情况下,Hadoop会将消息记录到Log4j。 Log4j通过类路径上的log4j.properties进行配置。此文件定义记录的内容和位置。对于应用程序,默认根记录器是" INFO,console",,它将INFO级别以上的所有消息记录到控制台的stderr 。服务器记录到" INFO,DRFA",它记录到每天滚动的文件。日志文件名为$ HADOOP_LOG_DIR / hadoop- $ HADOOP_IDENT_STRING-.log
我从未尝试过将日志重定向到stdout
,所以我无法真正帮助他,but a promising answer来自其他用户建议:
// Answer by Rajkumar Singh
// to get your stdout and log message on the console you can use apache
// commons logging framework in to your mapper and reducer.
public class MyMapper extends Mapper<..,...,..,...>{
public static final Log log = LogFactory.getLog(MyMapper.class)
public void map() throws Exception{
// Log to stdout file
System.out.println("Map key "+ key);
//log to the syslog file
log.info("Map key "+ key);
if(log.isDebugEanbled()){
log.debug("Map key "+ key);
}
context.write(key,value);
}
我建议试一试。