使用Python在HDFS上写

时间:2016-05-03 16:45:56

标签: python hadoop hdfs

我正在尝试用Python编写HDFS。 现在,我正在使用https://hdfscli.readthedocs.io/en/latest/quickstart.html 但是对于大文件我会回来:

    File "/home/edge7/venv-dev/local/lib/python2.7/site-packages/hdfs/client.py", line 400, in write
    consumer(data)
  File "/home/edge7/venv-dev/local/lib/python2.7/site-packages/hdfs/client.py", line 394, in consumer
    auth=False,
  File "/home/edge7/venv-dev/local/lib/python2.7/site-packages/hdfs/client.py", line 179, in _request
    **kwargs
  File "/home/edge7/venv-dev/local/lib/python2.7/site-packages/requests/sessions.py", line 465, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/edge7/venv-dev/local/lib/python2.7/site-packages/requests/sessions.py", line 573, in send
    r = adapter.send(request, **kwargs)
  File "/home/edge7/venv-dev/local/lib/python2.7/site-packages/requests/adapters.py", line 415, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', gaierror(-2, 'Name or service not known'))

我的写作代码非常简单:

client = InsecureClient('http://xxxxxxx.co:50070', user='hdfs')
client.write("/tmp/a",stringToWrite)

任何人都可以建议在HDFS上写一个像样的包吗? 干杯

1 个答案:

答案 0 :(得分:0)

对于堆栈跟踪,它似乎与安全性有关。您确定需要使用InsecureClient而不是Kerberos吗?另外,请记住库只是HttpF的绑定,因此使用Postman或CURL进行手动测试可以让你调试群集端的任何问题。