目标:使用pydoop从笔记本电脑中读取存储在HDFS中的远程文件。我正在使用pycharm专业版。我正在使用Cloudera CDH5.4
我的笔记本电脑上的pyCharm配置:在项目解释器中(在设置下),我已将python编译器指向远程服务器上的ssh:// remote-server-ip-address:port-number / home / ashish /anaconda/bin/python2.7
现在有一个文件存储在HDFS位置/home/ashish/pencil/someFileName.txt
然后我使用pip install pydoop在远程服务器上安装pydoop并安装它。然后我编写此代码以从hdfs位置读取文件
import pydoop.hdfs as hdfs
with hdfs.open('/home/ashish/pencil/someFileName.txt') as file:
for line in file:
print(line,'\n')
执行时我收到错误
Traceback (most recent call last):
File "/home/ashish/PyCharm_proj/Remote_Server_connect/hdfsConxn.py", line 7, in <module>
import pydoop.hdfs as hdfs
File /home/ashish/anaconda/lib/python2.7/sitepackages/pydoop/hdfs/__init__.py", line 82, in <module>
from . import common, path
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/path.py", line 28, in <module>
from . import common, fs as hdfs_fs
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/fs.py", line 34, in <module>
from .core import core_hdfs_fs
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/core/__init__.py", line 49, in <module>
_CORE_MODULE = init(backend=HDFS_CORE_IMPL)
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/hdfs/core/__init__.py", line 29, in init
jvm.load_jvm_lib()
File "/home/ashish/anaconda/lib/python2.7/site- packages/pydoop/utils/jvm.py", line 33, in load_jvm_lib
java_home = get_java_home()
File "/home/ashish/anaconda/lib/python2.7/site-packages/pydoop/utils/jvm.py", line 28, in get_java_home
raise RuntimeError("java home not found, try setting JAVA_HOME")
RuntimeError: java home not found, try setting JAVA_HOME
Process finished with exit code 1
我猜是也许它无法找到py4j。 py4j的位置是
/home/ashish/anaconda/lib/python2.7/site-packages/py4j
当我在远程服务器上回显java home时,
echo $JAVA_HOME
我得到了这个位置,
/usr/java/jdk1.7.0_67-cloudera
我是python中的编程以及centOS设置的新手,请建议我该怎么做才能解决这个问题?
谢谢
答案 0 :(得分:1)
好吧,看起来我解决了它。我做的是我用过
sys.path.append('/usr/java/jdk1.7.0_67-cloudera')
我更新了代码
import os, sys
sys.path.append('/usr/java/jdk1.7.0_67-cloudera')
input_file = '/home/ashish/pencil/someData.txt'
with open(input_file) as f:
for line in f:
print line
此代码从远程服务器中的HDFS读取文件,然后在我的笔记本电脑上的pycharm控制台中输出输出。
通过使用sys.path.append(),您不必手动更改hadoop.sh文件并导致与其他java配置文件冲突。
答案 1 :(得分:0)
您可以尝试在JAVA_HOME
中设置hadoop-env.sh
(默认情况下会注明)。
变化:
# The java implementation to use. Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun
要:
# The java implementation to use. Required.
export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
或者你的java安装目录是什么。