我正在使用Hadoop streaming jar
并尝试使用-cmdenv.
hadoop jar ../hadoop-streaming.jar \
-libjars .../something.jar \
-inputFormat ..CustomInputFormat \
-file mapper.py \
-file stream.py \
-cacheFile ../files#aliasname \
-cmdenv LD_LIBRARY_PATH=/home/files/1/xyz:/home/files/2/:/home/files/1/abc/ \
-mapper mapper.py \
-input hdfs:/inputfiles/ \
-output hdfs:/outputfiles/ \
-reducer NONE \
-verbose
我有几个问题。
1.映射器脚本无法看到cmdenv
中定义的环境变量
2.我可以提供hdfs
目录作为环境变量的路径吗?
当我运行执行hadoop
命令时,我在应用程序日志中收到错误,它会在加载共享库时抛出“错误:xyz.so:无法打开共享对象文件:没有这样的文件或目录”。
另外,我可以注意到流作业中的环境变量
STREAM:stream.addenvironment=HADOOP_ROOT_LOGGER= LD_LIBRARY_PATH=<path1>/:<path2>/..
请告诉我哪里出错?
答案 0 :(得分:2)
I was able to resolve the issue by adding the C code and dependency libraries to hdfs and then add them to CacheFile with symlinks and providing the symlinks in the environmental variable in cmdenv. See below.
hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar \
-libjars /<path>/jars/Hadoop_Streaming.jar \
-inputformat com.hadoop.IMCFInputFormat \
-outputformat org.apache.hadoop.mapred.TextOutputFormat \
-file mapper.sh \
-file stream.py \
-cacheFile hdfs:/<hdfspath>/IMCF/Common/#IMCF_Common \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/Common/#IMCF_STMC_Common \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarLabResults/#IMCF_STMC_StarLabResults \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarMeasureMembership/#IMCF_STMC_StarMeasureMembership \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarMedicalCase/#IMCF_STMC_StarMedicalCase \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarMedicalClaim/#IMCF_STMC_StarMedicalClaim \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarPrcdrTracking/#IMCF_STMC_StarPrcdrTracking \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarRxClaim/#IMCF_STMC_StarRxClaim \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/Stars/#IMCF_STMC_Stars \
-cacheFile hdfs:/<hdfspath>/IMCF/STMC/StarDerived/#IMCF_STMC_StarDerived \
-cacheFile hdfs:/<hdfspath>/lkup_files/MFW_meas_comp_def.lkup#MFW_meas_comp_def.lkup \
-cacheFile hdfs:/<hdfspath>/lkup_files/MFW_msr_cdset_x_cdset_lst.lkup#MFW_msr_cdset_x_cdset_lst.lkup \
-cacheFile hdfs:/<hdfspath>/lkup_files/MFW_msr_criteria.lkup#MFW_msr_criteria.lkup \
-cacheFile hdfs:/<hdfspath>/lkup_files/MFW_run_params.properties#MFW_run_params_HMO.env \
-cacheFile hdfs:/<hdfspath>/IMCF/MainModule/imcf.exe#imcf.exe \
-mapper mapper.sh \
-input /<hdfspath>/Stars_Ext_Tbl/IMCF_CIF_EXT/ \
-output /<hdfspath>/stream_output/ \
-reducer NONE \
-cmdenv LD_LIBRARY_PATH=IMCF_Common:IMCF_STMC_Common:IMCF_STMC_StarLabResults:IMCF_STMC_StarMeasureMembership:IMCF_STMC_StarMedicalCase:IMCF_STMC_StarMedicalClaim:IMCF_STMC_StarPrcdrTracking:IMCF_STMC_StarRxClaim:IMCF_STMC_Stars:IMCF_STMC_StarDerived \
-verbose