有没有办法直接在HDFS上生成文件?
我想避免生成本地文件,然后通过hdfs命令行,如:
hdfs dfs -put - "file_name.csv"
要复制到HDFS。
或者有没有python库?
答案 0 :(得分:0)
您是否尝试过使用HdfsCli?
引用段落Reading and Writing files:
# Loading a file in memory.
with client.read('features') as reader:
features = reader.read()
# Directly deserializing a JSON object.
with client.read('model.json', encoding='utf-8') as reader:
from json import load
model = load(reader)
答案 1 :(得分:0)
当我使用hdfscli写入方法时,速度极慢吗? 有没有办法加速使用hdfscli?
with client.write(conf.hdfs_location+'/'+ conf.filename, encoding='utf-8', buffersize=10000000) as f:
writer = csv.writer(f, delimiter=conf.separator)
for i in tqdm(10000000000):
row = [column.get_value() for column in conf.columns]
writer.writerow(row)
非常感谢。
答案 2 :(得分:0)
hdfs dfs -put
不需要你在本地创建文件。此外,无需在hdfs(touchz
)上创建零字节文件并附加到它(appendToFile
)。您可以直接在hdfs上写文件:
hadoop fs -put - /user/myuser/testfile
点击进入。在命令提示符下,输入要放入文件中的文本。完成后,请说Ctrl+D
。
答案 3 :(得分:0)
使用python将本地文件写入hdfs的两种方法:
一种方法是使用hdfs python软件包:
代码段:
from hdfs import InsecureClient
hdfsclient = InsecureClient('http://localhost:50070', user='madhuc')
hdfspath="/user/madhuc/hdfswritedata/"
localpath="/home/madhuc/sample.csv"
hdfsclient.upload(hdfspath, localpath)
输出位置:'/ user / madhuc / hdfswritedata / sample.csv'
否则是使用PIPE的子进程python包
代码段:
from subprocess import PIPE, Popen
# put file into hdfs
put = Popen(["hadoop", "fs", "-put", localpath, hdfspath], stdin=PIPE, bufsize=-1)
put.communicate()
print("File Saved Successfully")