我正在执行REST API调用以获取SharePoint文档库的文件夹。
我想递归获取整个目录树中的所有文件夹路径。
我编写了一个函数来从给定文件夹中获取子文件夹列表,但是不确定如何遍历第N个目录并获取所有文件夹路径。
例如,假设当前的SharePoint文档库结构如下JSON(fo = folder; f = file):
{
"root": [
{
"fo1": {
"fo1": "f1",
"fo2": ["f1", "f2"]
},
"fo2": ["fi1", "fi2"]
},
"fi1","fi2"]
}
在上面的示例中,我想要所有文件夹/目录的路径列表: 例如,输出应为:
["/root/fo1/", "/root/fo1/fo1/", "/root/fo1/fo2/", "/root/fo2/"]
由于这是一个REST API调用,因此直到运行查询get子文件夹然后进入每个子文件夹以获取其各自的子文件夹之前,我才知道该结构。
我编写的当前(后续)函数正在获取数据直到1级(子文件夹,因为它是基于内部迭代的,而不是递归的),我如何才能实现基于递归的解决方案来获取所有唯一的文件夹路径列表吗?
def print_root_contents(ctx):
try:
list_object = ctx.web.lists.get_by_title('Documents')
folder = list_object.root_folder
ctx.load(folder)
ctx.execute_query()
folders = folder.folders
ctx.load(folders)
ctx.execute_query()
for myfolder in folders:
print("For Folder : {0}".format(myfolder.properties["Name"]))
folder_list, files_list = print_folder_contents(ctx, myfolder.properties["Name"])
print("Sub folders - ", folder_list)
print("Files - ", files_list)
except Exception as e:
print('Problem printing out library contents: ', e)
def print_folder_contents(ctx, folder_name):
try:
folder = ctx.web.get_folder_by_server_relative_url("/sites/abc/Shared Documents/"+folder_name+"/")
ctx.load(folder)
ctx.execute_query()
# Folders
fold_names = []
sub_folders = folder.folders
ctx.load(sub_folders)
ctx.execute_query()
for s_folder in sub_folders:
# folder_name = folder_name+"/"+s_folder.properties["Name"]
# print("Folder name: {0}".format(folder.properties["Name"]))
fold_names.append(s_folder.properties["Name"])
return fold_names
except Exception as e:
print('Problem printing out library contents: ', e)
在上面的最后一个函数(print_folder_contents)中,我无法形成一个递归逻辑来保持递归地添加文件夹和子文件夹,并在第n个文件夹内没有更多文件夹时停止它,并继续进行下一个同级文件夹上。
发现它确实具有挑战性。有帮助吗?
答案 0 :(得分:0)
您可以使用生成器函数来迭代dict项,并生成dict键和yield键,这些键与递归调用生成的路径结合在一起;如果给出列表,则递归产生在列表项上递归调用所生成的内容:>
def paths(d):
def _paths(d):
if isinstance(d, dict):
for k, v in d.items():
yield k + '/'
for p in _paths(v):
yield '/'.join((k, p))
elif isinstance(d, list):
for i in d:
yield from _paths(i)
return ['/' + p for p in _paths(d)]
所以给定:
d = {
"root": [
{
"fo1": {
"fo1": "f1",
"fo2": ["f1", "f2"]
},
"fo2": ["fi1", "fi2"]
},
"fi1","fi2"]
}
paths(d)
返回:
['/root/', '/root/fo1/', '/root/fo1/fo1/', '/root/fo1/fo2/', '/root/fo2/']
请注意,您的预期输出应包括'/root/'
,因为根文件夹也应该是有效文件夹。
答案 1 :(得分:0)
我知道这个答案对游戏来说还很晚,但是您可以执行以下类似操作,以在给定某些父目录的情况下获得所有子SharePoint对象的平面列表。
之所以可行,是因为我们不断扩展单个列表,而不是在递归某些目录树时利用list.append()
方法创建嵌套对象。
我肯定会有机会改善下面的代码段,但是我相信这应该可以帮助您实现目标。
干杯
rs311
from office365.sharepoint.client_context import ClientContext
def get_items_in_directory(ctx_client: ClientContext,
directory_relative_uri: str,
recursive: bool = True):
"""
This function provides a way to get all items in a directory in SharePoint, with
the option to traverse nested directories to extract all child objects.
:param ctx_client: office365.sharepoint.client_context.ClientContext object
SharePoint ClientContext object.
:param directory_relative_uri: str
Path to directory in SharePoint.
:param recursive: bool
default = False
Tells function whether or not to perform a recursive call.
:return: list
Returns a flattened array of all child file and/or folder objects
given some parent directory. All items will be of the following types:
- office365.sharepoint.file.File
- office365.sharepoint.folder.Folder
Examples
---------
All examples assume you've already authenticated with SharePoint per
documentation found here:
- https://github.com/vgrem/Office365-REST-Python-Client#examples
Assumed directory structure:
some_directory/
my_file.csv
your_file.xlsx
sub_directory_one/
123.docx
abc.csv
sub_directory_two/
xyz.xlsx
directory = 'some_directory'
# Non-recursive call
extracted_child_objects = get_items_in_directory(directory)
# extracted_child_objects would contain (my_file.csv, your_file.xlsx, sub_directory_one/, sub_directory_two/)
# Recursive call
extracted_child_objects = get_items_in_directory(directory, recursive=True)
# extracted_child_objects would contain (my_file.csv, your_file.xlsx, sub_directory_one/, sub_directory_two/, sub_directory_one/123.docx, sub_directory_one/abc.csv, sub_directory_two/xyz.xlsx)
"""
contents = list()
folders = ctx_client.web.get_folder_by_server_relative_url(directory_relative_uri).folders
ctx_client.load(folders)
ctx_client.execute_query()
if recursive:
for folder in folders:
contents.extend(
get_items_in_directory(
ctx_client=ctx_client,
directory_relative_uri=folder.properties['ServerRelativeUrl'],
recursive=recursive)
)
contents.extend(folders)
files = ctx_client.web.get_folder_by_server_relative_url(directory_relative_uri).files
ctx_client.load(files)
ctx_client.execute_query()
contents.extend(files)
return contents