在hadoop流中将多个路径传递给cmdenv

时间:2014-10-15 14:24:13

标签: hadoop-streaming

我正在使用Hadoop streaming jar并尝试使用-cmdenv.

传递指向多个路径的环境变量
hadoop jar ../hadoop-streaming.jar \
-libjars .../something.jar \
-inputFormat ..CustomInputFormat \
-file mapper.py \
-file stream.py \
-cacheFile ../files#aliasname \
-cmdenv LD_LIBRARY_PATH=/home/files/1/xyz:/home/files/2/:/home/files/1/abc/ \
-mapper mapper.py \
-input hdfs:/inputfiles/ \
-output hdfs:/outputfiles/ \
-reducer NONE \
-verbose

我有几个问题。 1.映射器脚本无法看到cmdenv中定义的环境变量 2.我可以提供hdfs目录作为环境变量的路径吗?

当我运行执行hadoop命令时,我在应用程序日志中收到错误,它会在加载共享库时抛出“错误:xyz.so:无法打开共享对象文件:没有这样的文件或目录”。

另外,我可以注意到流作业中的环境变量 STREAM:stream.addenvironment=HADOOP_ROOT_LOGGER= LD_LIBRARY_PATH=<path1>/:<path2>/..

请告诉我哪里出错?

1 个答案:

答案 0 :(得分:2)

I was able to resolve the issue by adding the C code and dependency libraries to hdfs and then add them to CacheFile with symlinks and providing the symlinks in the environmental variable in cmdenv. See below.

hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
 -libjars /<path>/jars/Hadoop_Streaming.jar \
 -inputformat com.hadoop.IMCFInputFormat \
 -outputformat org.apache.hadoop.mapred.TextOutputFormat \
 -file mapper.sh \
 -file stream.py \
 -cacheFile hdfs:/<hdfspath>/IMCF/Common/#IMCF_Common \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/Common/#IMCF_STMC_Common \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarLabResults/#IMCF_STMC_StarLabResults \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarMeasureMembership/#IMCF_STMC_StarMeasureMembership \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarMedicalCase/#IMCF_STMC_StarMedicalCase \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarMedicalClaim/#IMCF_STMC_StarMedicalClaim \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarPrcdrTracking/#IMCF_STMC_StarPrcdrTracking \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarRxClaim/#IMCF_STMC_StarRxClaim \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/Stars/#IMCF_STMC_Stars \
 -cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarDerived/#IMCF_STMC_StarDerived \
 -cacheFile hdfs:/<hdfspath>/lkup_files/MFW_meas_comp_def.lkup#MFW_meas_comp_def.lkup \
 -cacheFile hdfs:/<hdfspath>/lkup_files/MFW_msr_cdset_x_cdset_lst.lkup#MFW_msr_cdset_x_cdset_lst.lkup \
 -cacheFile hdfs:/<hdfspath>/lkup_files/MFW_msr_criteria.lkup#MFW_msr_criteria.lkup \
 -cacheFile hdfs:/<hdfspath>/lkup_files/MFW_run_params.properties#MFW_run_params_HMO.env \
 -cacheFile hdfs:/<hdfspath>/IMCF/MainModule/imcf.exe#imcf.exe \
 -mapper mapper.sh \
 -input /<hdfspath>/Stars_Ext_Tbl/IMCF_CIF_EXT/ \
 -output /<hdfspath>/stream_output/ \
 -reducer NONE \
 -cmdenv LD_LIBRARY_PATH=IMCF_Common:IMCF_STMC_Common:IMCF_STMC_StarLabResults:IMCF_STMC_StarMeasureMembership:IMCF_STMC_StarMedicalCase:IMCF_STMC_StarMedicalClaim:IMCF_STMC_StarPrcdrTracking:IMCF_STMC_StarRxClaim:IMCF_STMC_Stars:IMCF_STMC_StarDerived \
 -verbose