Hadoop集群中的mrjob virtualenv错误:权限被拒绝

时间:2015-04-10 23:50:45

标签: python hadoop pip virtualenv mrjob

我在一家拥有Hadoop集群的大型企业组织工作。我让管理员在所有Hadoop工作节点上安装virtualenv,以便我可以提交具有标准mrjob依赖关系的Python,这些依赖关系可能不存在于工作节点上。根据文档here,这就是我的mrjob.conf文件的样子:

runners:
  hadoop:
    setup:
    - virtualenv venv
    - . venv/bin/activate
    - pip install nltk

我有一个使用nltk包的简单工作。我可以验证这个安装脚本是否在工作节点上运行(我可以将简单的命令写入/tmp中的一些文件并且它可以工作)。但是,我收到以下错误:

New python executable in venv/bin/python
Installing setuptools............done.
Installing pip...
  Error [Errno 13] Permission denied while executing command /storage5/hadoop/map...env/bin/easy_install /usr/share/python-virtualenv/pip-1.1.tar.gz
...Installing pip...done.
Traceback (most recent call last):
  File "/usr/bin/virtualenv", line 3, in <module>
    virtualenv.main()
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 938, in main
    never_download=options.never_download)
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 1054, in create_environment
    install_pip(py_executable, search_dirs=search_dirs, never_download=never_download)
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 643, in install_pip
    filter_stdout=_filter_setup)
  File "/usr/lib/python2.7/dist-packages/virtualenv.py", line 976, in call_subprocess
    cwd=cwd, env=env)
  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 13] Permission denied
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

可能导致此错误的原因是什么?

1 个答案:

答案 0 :(得分:0)

感谢您将这个包部署到群集的想法。

至于你的问题,我认为它看起来没有写入目录的权限。