使用HDFS python软件包上传文件时出现连接错误

时间:2019-08-08 10:31:47

标签: python apache hadoop hdfs webhdfs

我正在尝试创建一个Python程序,该程序连接到远程计算机上的hadoop文件系统,并从中上传和下载文件。该程序现在看起来像(使用IP =我的远程计算机IP):

from hdfs import InsecureClient
client = InsecureClient('http:/IP:9870', user='hadoop')

path = client.resolve('storage/')
client.makedirs(path, permission=int(755))
client.upload(path,'/home/storage/model1.h5')

client.download('storage/'+'model1.h5','../storage/model1.h5')

我可以成功进入makedirs命令,但是随后,在上传文件时出现此错误:

Traceback (most recent call last):
  File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/connection.py", line 160, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw)
  File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/util/connection.py", line 57, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/adrian/.vscode/extensions/ms-python.python-2019.8.29288/pythonFiles/ptvsd_launcher.py", line 43, in <module>
    main(ptvsdArgs)
  File "/home/adrian/.vscode/extensions/ms-python.python-2019.8.29288/pythonFiles/lib/python/ptvsd/__main__.py", line 432, in main
    run()
  File "/home/adrian/.vscode/extensions/ms-python.python-2019.8.29288/pythonFiles/lib/python/ptvsd/__main__.py", line 316, in run_file
    runpy.run_path(target, run_name='__main__')
  File "/usr/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/adrian/proyectos/iot-ai-engine/src/hadoop.py", line 6, in <module>
    client.upload(path,'/home/adrian/proyectos/iot-ai-engine/storage/model1.h5')
  File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 611, in upload
    raise err
  File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 600, in upload
    _upload(path_tuple)
  File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 531, in _upload
    self.write(_temp_path, wrap(reader, chunk_size, progress), **kwargs)
  File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 477, in write
    consumer(data)
  File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 469, in consumer
    data=(c.encode(encoding) for c in _data) if encoding else _data,
  File "/home/adrian/.local/lib/python3.6/site-packages/hdfs/client.py", line 214, in _request
    **kwargs
  File "/home/adrian/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/adrian/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/adrian/.local/lib/python3.6/site-packages/requests/adapters.py", line 467, in send
    low_conn.endheaders()
  File "/usr/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/connection.py", line 183, in connect
    conn = self._new_conn()
  File "/home/adrian/.local/lib/python3.6/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fa6cf241940>: Failed to establish a new connection: [Errno -2] Name or service not known

namenode docker容器的日志也不是很有帮助:

2019-08-08 10:18:17 INFO  audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE)    ip=/1{ip}   cmd=mkdirs  src=/user/hadoop/storage    dst=null    perm=hadoop:supergroup:rwxr-xr-x    proto=webhdfs


2019-08-08 10:18:18 INFO  audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE)    ip=/{ip}    cmd=listStatus  src=/user/hadoop/storage    dst=null    perm=null   proto=webhdfs


2019-08-08 10:18:18 INFO  audit:8042 - allowed=true ugi=hadoop (auth:SIMPLE)    ip=/{ip}    cmd=delete  src=/user/hadoop/storage/model1.h5  dst=null    perm=null   proto=webhdfs

我在这里做错了什么?


HDFS生态系统是使用以下docker-compose.yaml文件构建的:

version: "2"
services:
   namenode:
      image: flokkr/hadoop:latest
      hostname: namenode
      command: ["hdfs","namenode"]
      ports:
         - 50070:50070
         - 9870:9870
      env_file:
        - ./compose-config
      environment:
          NAMENODE_INIT: "hdfs dfs -chmod 777 /"
          ENSURE_NAMENODE_DIR: "/tmp/hadoop-hadoop/dfs/name"
   datanode:
      command: ["hdfs","datanode"]
      image: flokkr/hadoop:latest
      env_file:
        - ./compose-config
   resourcemanager:
      image: flokkr/hadoop:latest
      hostname: resourcemanager
      command: ["yarn", "resourcemanager"]
      ports:
         - 8088:8088
      env_file:
        - ./compose-config
   nodemanager:
      image: flokkr/hadoop-yarn-nodemanager:latest
      command: ["yarn", "nodemanager"]
      env_file:
        - ./compose-config

compose-config文件如下:

CORE-SITE.XML_fs.default.name=hdfs://namenode:9000
CORE-SITE.XML_fs.defaultFS=hdfs://namenode:9000
HDFS-SITE.XML_dfs.namenode.rpc-address=namenode:9000
HDFS-SITE.XML_dfs.replication=1
LOG4J.PROPERTIES_log4j.rootLogger=INFO, stdout
LOG4J.PROPERTIES_log4j.appender.stdout=org.apache.log4j.ConsoleAppender
LOG4J.PROPERTIES_log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
LOG4J.PROPERTIES_log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
MAPRED-SITE.XML_mapreduce.framework.name=yarn
YARN-SITE.XML_yarn.resourcemanager.hostname=resourcemanager
YARN-SITE.XML_yarn.nodemanager.pmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.delete.debug-delay-sec=600
YARN-SITE.XML_yarn.nodemanager.vmem-check-enabled=false
YARN-SITE.XML_yarn.nodemanager.aux-services=mapreduce_shuffle
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-applications=10000
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.maximum-am-resource-percent=0.1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.resource-calculator=org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.queues=default
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.user-limit-factor=1
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.maximum-capacity=100
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.state=RUNNING
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_submit_applications=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.root.default.acl_administer_queue=*
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.node-locality-delay=40
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings=
CAPACITY-SCHEDULER.XML_yarn.scheduler.capacity.queue-mappings-override.enable=false

0 个答案:

没有答案