使用python从主机加载数据到hdfs:没有这样的文件或目录错误

时间:2017-09-29 12:28:21

标签: python hadoop hdfs

我想使用python脚本将文件ad.py从我的主机(ubuntu 16.04 lts)加载到hdfs的输入文件夹中。到目前为止我发现的是:

import subprocess


def run_cmd(args_list):
        """
        run linux commands
        """
        # import subprocess
        print('Running system command: {0}'.format(' '.join(args_list)))
        proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        s_output, s_err = proc.communicate()
        s_return =  proc.returncode
        return s_return, s_output, s_err 


run_cmd(['hdfs', 'dfs', '-put', '/home/mernst/Desktop/ad.py', 'hdfs://localhost:55760/input'])
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-ls', 'hdfs://localhost:55760/input'])

print(out)

如果我将上面的代码保存在一个文件中(例如名为myfile.py)并在bash中使用python myfile.py运行它,则文件ad.py不会加载到输入文件夹中,但至少会加载list命令工作,我可以看到hdfs中存储了哪些文件:

python python-hadoop.py
Running system command: hdfs dfs -put /home/mernst/Desktop/ad.py
/usr/local/bin/hdfs/input
Running system command: hdfs dfs -ls hdfs://localhost:55760/input
Found 4 items
drwxr-xr-x   - hduser supergroup          0 2017-09-21 15:59
hdfs://localhost:55760/input/1st_test
-rw-r--r--   1 hduser supergroup        393 2017-09-20 10:28
hdfs://localhost:55760/input/PySpark.txt
-rw-r--r--   1 hduser supergroup         14 2017-09-19 14:50   
hdfs://localhost:55760/input/file.txt
-rw-r--r--   1 hduser supergroup         46 2017-09-28 09:57 
hdfs://localhost:55760/input/streaming_kmeans_data_test.txt

但是,如果我使用sudo myfile.py运行脚本,则会发生以下错误:

Traceback (most recent call last):
File "python-hadoop.py", line 18, in <module>
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-ls', 'hdfs://localhost:55760/input'])
File "python-hadoop.py", line 11, in run_cmd
proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)#cwd = "/usr/lib/jvm/java-8-oracle/jre/")
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory

我用google搜索错误,似乎发生了,当subprocess.Propen中的args_list参数是一个字符串而不是一个列表时,但是在myfile.py中并非如此,所以我真的不知道是什么这里错了,任何帮助都会很棒。

p.s。:我在subprocess.Popen中尝试了shell = True但没有成功。

0 个答案:

没有答案