我想使用python脚本将文件ad.py从我的主机(ubuntu 16.04 lts)加载到hdfs的输入文件夹中。到目前为止我发现的是:
import subprocess
def run_cmd(args_list):
"""
run linux commands
"""
# import subprocess
print('Running system command: {0}'.format(' '.join(args_list)))
proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
s_output, s_err = proc.communicate()
s_return = proc.returncode
return s_return, s_output, s_err
run_cmd(['hdfs', 'dfs', '-put', '/home/mernst/Desktop/ad.py', 'hdfs://localhost:55760/input'])
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-ls', 'hdfs://localhost:55760/input'])
print(out)
如果我将上面的代码保存在一个文件中(例如名为myfile.py)并在bash中使用python myfile.py运行它,则文件ad.py不会加载到输入文件夹中,但至少会加载list命令工作,我可以看到hdfs中存储了哪些文件:
python python-hadoop.py
Running system command: hdfs dfs -put /home/mernst/Desktop/ad.py
/usr/local/bin/hdfs/input
Running system command: hdfs dfs -ls hdfs://localhost:55760/input
Found 4 items
drwxr-xr-x - hduser supergroup 0 2017-09-21 15:59
hdfs://localhost:55760/input/1st_test
-rw-r--r-- 1 hduser supergroup 393 2017-09-20 10:28
hdfs://localhost:55760/input/PySpark.txt
-rw-r--r-- 1 hduser supergroup 14 2017-09-19 14:50
hdfs://localhost:55760/input/file.txt
-rw-r--r-- 1 hduser supergroup 46 2017-09-28 09:57
hdfs://localhost:55760/input/streaming_kmeans_data_test.txt
但是,如果我使用sudo myfile.py运行脚本,则会发生以下错误:
Traceback (most recent call last):
File "python-hadoop.py", line 18, in <module>
(ret, out, err)= run_cmd(['hdfs', 'dfs', '-ls', 'hdfs://localhost:55760/input'])
File "python-hadoop.py", line 11, in run_cmd
proc = subprocess.Popen(args_list, stdout=subprocess.PIPE, stderr=subprocess.PIPE)#cwd = "/usr/lib/jvm/java-8-oracle/jre/")
File "/usr/lib/python2.7/subprocess.py", line 711, in __init__
errread, errwrite)
File "/usr/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
我用google搜索错误,似乎发生了,当subprocess.Propen中的args_list参数是一个字符串而不是一个列表时,但是在myfile.py中并非如此,所以我真的不知道是什么这里错了,任何帮助都会很棒。
p.s。:我在subprocess.Popen中尝试了shell = True但没有成功。