我正在尝试创建一个Python程序,该程序连接到远程计算机上的hadoop文件系统,并从中上传和下载文件。该程序现在看起来像(使用IP =我的远程计算机IP):
from hdfs import InsecureClient
client = InsecureClient('http:/IP:9870', user='hadoop')
path = client.resolve('storage/')
client.makedirs(path, permission=int(755))
client.upload(path,'/home/storage/model1.h5')
client.download('storage/'+'model1.h5','../storage/model1.h5')
我可以成功进入makedirs命令,但是随后,在上传文件时出现此错误:
Traceback (most recent call last):
File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/util/connection.py", line 57, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/adrian/.vscode/extensions/ms-python.python-2019.8.29288/pythonFiles/ptvsd_launcher.py", line 43, in <module>
main(ptvsdArgs)
File "/home/adrian/.vscode/extensions/ms-python.python-2019.8.29288/pythonFiles/lib/python/ptvsd/__main__.py", line 432, in main
run()
File "/home/adrian/.vscode/extensions/ms-python.python-2019.8.29288/pythonFiles/lib/python/ptvsd/__main__.py", line 316, in run_file
runpy.run_path(target, run_name='__main__')
File "/usr/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/adrian/proyectos/iot-ai-engine/src/hadoop.py", line 6, in <module>
client.upload(path,'/home/adrian/proyectos/iot-ai-engine/storage/model1.h5')
File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 611, in upload
raise err
File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 600, in upload
_upload(path_tuple)
File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 531, in _upload
self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs)
File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 477, in write
consumer(data)
File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 469, in consumer
data=(c.encode(encoding) for c in _data) if encoding else _data,
File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 214, in _request
**kwargs
File "/home/adrian/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/adrian/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/adrian/.local/lib/python3.6/site-packages/requests/adapters.py", line 467, in send
low_conn.endheaders()
File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
self.send(msg)
File "/usr/lib/python3.6/http/client.py", line 964, in send
self.connect()
File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/connection.py", line 183, in connect
conn = self._new_conn()
File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/connection.py", line 169, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fa6cf241940>: Failed to establish a new connection: [Errno -2] Name or service not known
namenode docker容器的日志也不是很有帮助:
2019-08-08 10:18:17 INFO audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE) ip=/1{ip} cmd=mkdirs src=/user/hadoop/storage dst=null perm=hadoop:supergroup:rwxr-xr-x proto=webhdfs
2019-08-08 10:18:18 INFO audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE) ip=/{ip} cmd=listStatus src=/user/hadoop/storage dst=null perm=null proto=webhdfs
2019-08-08 10:18:18 INFO audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE) ip=/{ip} cmd=delete src=/user/hadoop/storage/model1.h5 dst=null perm=null proto=webhdfs
我在这里做错了什么?
HDFS生态系统是使用以下docker-compose.yaml文件构建的:
version: "2"
services:
namenode:
image: flokkr/hadoop:latest
hostname: namenode
command: ["hdfs","namenode"]
ports:
- 50070:50070
- 9870:9870
env_file:
- ./compose-config
environment:
NAMENODE_INIT: "hdfs dfs -chmod 777 /"
ENSURE_NAMENODE_DIR: "/tmp/hadoop-hadoop/dfs/name"
datanode:
command: ["hdfs","datanode"]
image: flokkr/hadoop:latest
env_file:
- ./compose-config
resourcemanager:
image: flokkr/hadoop:latest
hostname: resourcemanager
command: ["yarn", "resourcemanager"]
ports:
- 8088:8088
env_file:
- ./compose-config
nodemanager:
image: flokkr/hadoop-yarn-nodemanager:latest
command: ["yarn", "nodemanager"]
env_file:
- ./compose-config
compose-config文件如下:
CORE-SITE.XML_fs.default.name=hdfs://namenode:9000
CORE-SITE.XML_fs.defaultFS=hdfs://namenode:9000
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:9000
HDFS-SITE.XML_dfs.replication=1
LOG4J.PROPERTIES_log4j.rootLogger=INFO, stdout
LOG4J.PROPERTIES_log4j.appender.stdout=org.apache.log4j.ConsoleAppender
LOG4J.PROPERTIES_log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
LOG4J.PROPERTIES_log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
MAPRED-SITE.XML_mapreduce.framework.name=yarn
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false