Question

我目前正在使用以下命令成功列出wrapper.update()中的文件：

Azure Datalake Store gen1

此文件夹的结构为

dbutils.fs.ls('mnt/dbfolder1/projects/clients')

我想遍历该文件夹中的那些（- client_comp_automotive_1.json [File] - client_comp_automotive_2.json [File] - client_comp_automotive_3.json [File] - client_comp_automotive_4.json [File] - PROCESSED [Folder]）文件，并一步一步地处理它们，以便我可以对错误或其他错误采取行动，并将成功处理的文件移至子文件夹。

如何在.json中执行此操作。我尝试过

python

但这不起作用。 folder = dbutils.fs.ls('mnt/dbfolder1/projects/clients') files = [f for f in os.listdir(folder) if os.path.isfile(f)]是未知的。如何在os内做到这一点？

Answer 1

即使我搜索了两天，答案也很简单：

files = dbutils.fs.ls('mnt/dbfolder1/projects/clients')

for fi in files: 
  print(fi.path)

Answer 2

相同的标量版本（带有ADLS路径）

val dirList = dbutils.fs.ls("abfss://<container>@<storage_account>.dfs.core.windows.net/<DIR_PATH>/")

// option1
dirList.foreach(println)

// option2
for (dir <- dirList) println(dir.name)

Answer 3

无缝转换为本地安装 python 的另一种方法是：

import os
os.listdir("/dbfs/mnt/projects/clients/")

如何在Azure Databricks中循环浏览Azure Datalake存储文件

3 个答案: