如何获取hdfs文件夹中的子文件夹列表?

时间:2018-09-19 07:38:55

标签: r apache-spark sparklyr

假设我的实木复合地板存储如下:

hdfs://root/folder1/pqt1.pqt
hdfs://root/folder2/pqt2.pqt
hdfs://root/folder3/pqt3.pqt
hdfs://root/folder4/part1/pqt4part1.pqt
hdfs://root/folder4/part2/pqt4part1.pqt
...

如何使用sparklyr在R中的'hdfs:// root'中列出子文件夹?所需的输出为(无递归):

hdfs://root/folder1/
hdfs://root/folder2/
hdfs://root/folder3/
hdfs://root/folder4/
...

并具有递归:

hdfs://root/folder1/
hdfs://root/folder2/
hdfs://root/folder3/
hdfs://root/folder4/
hdfs://root/folder4/part1/
hdfs://root/folder4/part2/
...

1 个答案:

答案 0 :(得分:0)

以R为基数可能已经足够

list.dirs(path = "hdfs://root", full.names = TRUE, recursive = TRUE)