这是一个常见问题。
场景是: -
folderA____ folderA1____folderA1a
\____folderA2____folderA2a
\___folderA2b
...问题是我如何列出根folderA
下所有文件夹中的所有文件。
答案 0 :(得分:18)
首先要了解的是,在Google云端硬盘中,文件夹不是文件夹!
我们已经习惯了Windows / nix等文件夹(aka目录)的概念。在现实世界中,文件夹是一个容器,文档放在其中。也可以将较小的文件夹放在较大的文件夹中。因此,可以将大文件夹视为包含其较小子文件夹中的所有文档。
但是,在Google云端硬盘中,文件夹不是一个容器,以至于在Google云端硬盘的第一个版本中,它们甚至不称为文件夹,它们被称为收藏夹。文件夹只是一个文件,其中包含(a)无内容,(b)特殊的mime类型(application / vnd.google-apps.folder)。使用文件夹的方式是完全,就像使用标签(也就是标签)一样。理解这一点的最好方法是考虑GMail。如果查看打开的邮件项目的顶部,则会看到两个图标。带有工具提示的文件夹"移至"和带有工具提示的标签"标签"。单击其中任何一个,将出现相同的对话框,所有这些都与标签有关。您的标签在左侧列出,在树状显示中看起来很像文件夹。重要的是,邮件项目可以有多个标签,或者您可以说,邮件项目可以位于多个文件夹中。 Google云端硬盘的文件夹与GMail标签的工作方式完全相同。
确定文件夹只是一个标签,没有什么能阻止你在类似于文件夹树的层次结构中组织标签,事实上这是最常见的方式。
现在应该很清楚,文件夹A2b中的文件(让我们称之为MyFile)不是folderA的子孙。它只是一个带有标签(容易混淆地称为父母)的文件" folderA2b"。
好的,那么如何获取所有文件"" folderA吗
替代方案1.递归
诱惑就是列出folderA的子项,对于任何子文件夹,递归列出他们的孩子,冲洗,重复。在极少数情况下,这可能是最好的方法,但对大多数情况来说,它有以下问题: -
备选方案2.共同父母
如果您的应用程序正在创建所有文件(即您正在使用drive.file范围),则此方法效果最佳。除了上面的文件夹层次结构,还要创建一个名为say" MyAppCommonParent"的虚拟父文件夹。当您将每个文件创建为其特定文件夹的子文件时,您还将其设为MyAppCommonParent的子文件。如果您记得将文件夹视为标签,这将变得更加直观。现在,您只需查询MyAppCommonParent in parents
即可轻松检索所有项目。
备用3.文件夹优先
首先获取所有文件夹。是的,所有这些。将它们全部存储在内存中后,您可以爬行其父属性并构建树结构和文件夹ID列表。然后,您可以执行单个files.list?q='folderA' in parents or 'folderA1' in parents or 'folderA1a' in parents...
。使用这种技术,您可以通过两次http调用获取所有内容。
选项3的伪代码有点像......
// get all folders from Drive
files.list?q=mimetype=application/vnd.google-apps.folder and trashed=false&fields=parents,name
// store in a Map, keyed by ID
// find the entry for folderA and note the ID
// find any entries where the ID is in the parents, note their IDs
// for each such entry, repeat recursively
// use all of the IDs noted above to construct a ...
// files.list?q='folderA-ID' in parents or 'folderA1-ID' in parents or 'folderA1a-ID' in parents...
备选方案2是最有效的,但只有在您控制文件创建时才有效。替代方案3通常比替代方案1更有效,但可能存在某些小树大小,其中1是最佳的。
答案 1 :(得分:2)
将@pinoyyid的Python解决方案共享给优秀的 Alternative 3 ,以防对任何人有用。我不是开发人员,所以它可能是无法使用Python的...但是它可以工作,只能进行2次API调用,而且速度很快。
'<folder-id>' in parents
段。有趣的是,Google云端硬盘似乎对每个查询有599 '<folder-id>' in parents
个段的硬限制,因此,如果要搜索的文件夹中有更多子文件夹,则需要对列表进行分块。
FOLDER_TO_SEARCH = '123456789' # ID of folder to search
DRIVE_ID = '654321' # ID of shared drive in which it lives
MAX_PARENTS = 500 # Limit set safely below Google max of 599 parents per query.
def get_all_folders_in_drive():
"""
Return a dictionary of all the folder IDs in a drive mapped to their parent folder IDs (or to the
drive itself if a top-level folder). That is, flatten the entire folder structure.
"""
folders_in_drive_dict = {}
page_token = None
max_allowed_page_size = 1000
just_folders = "trashed = false and mimeType = 'application/vnd.google-apps.folder'"
while True:
results = drive_api_ref.files().list(
pageSize=max_allowed_page_size,
fields="nextPageToken, files(id, name, mimeType, parents)",
includeItemsFromAllDrives=True, supportsAllDrives=True,
corpora='drive',
driveId=DRIVE_ID,
pageToken=page_token,
q=just_folders).execute()
folders = results.get('files', [])
page_token = results.get('nextPageToken', None)
for folder in folders:
folders_in_drive_dict[folder['id']] = folder['parents'][0]
if page_token is None:
break
return folders_in_drive_dict
def get_subfolders_of_folder(folder_to_search, all_folders):
"""
Yield subfolders of the folder-to-search, and then subsubfolders etc. Must be called by an iterator.
:param all_folders: The dictionary returned by :meth:`get_all_folders_in-drive`.
"""
temp_list = [k for k, v in all_folders.items() if v == folder_to_search] # Get all subfolders
for sub_folder in temp_list: # For each subfolder...
yield sub_folder # Return it
yield from get_subfolders_of_folder(sub_folder, all_folders) # Get subsubfolders etc
def get_relevant_files(self, relevant_folders):
"""
Get files under the folder-to-search and all its subfolders.
"""
relevant_files = {}
chunked_relevant_folders_list = [relevant_folders[i:i + MAX_PARENTS] for i in
range(0, len(relevant_folders), MAX_PARENTS)]
for folder_list in chunked_relevant_folders_list:
query_term = ' in parents or '.join('"{0}"'.format(f) for f in folder_list) + ' in parents'
relevant_files.update(get_all_files_in_folders(query_term))
return relevant_files
def get_all_files_in_folders(self, parent_folders):
"""
Return a dictionary of file IDs mapped to file names for the specified parent folders.
"""
files_under_folder_dict = {}
page_token = None
max_allowed_page_size = 1000
just_files = f"mimeType != 'application/vnd.google-apps.folder' and trashed = false and ({parent_folders})"
while True:
results = drive_api_ref.files().list(
pageSize=max_allowed_page_size,
fields="nextPageToken, files(id, name, mimeType, parents)",
includeItemsFromAllDrives=True, supportsAllDrives=True,
corpora='drive',
driveId=DRIVE_ID,
pageToken=page_token,
q=just_files).execute()
files = results.get('files', [])
page_token = results.get('nextPageToken', None)
for file in files:
files_under_folder_dict[file['id']] = file['name']
if page_token is None:
break
return files_under_folder_dict
if __name__ == "__main__":
all_folders_dict = get_all_folders_in_drive() # Flatten folder structure
relevant_folders_list = [FOLDER_TO_SEARCH] # Start with the folder-to-archive
for folder in get_subfolders_of_folder(FOLDER_TO_SEARCH, all_folders_dict):
relevant_folders_list.append(folder) # Recursively search for subfolders
relevant_files_dict = get_relevant_files(relevant_folders_list) # Get the files
答案 2 :(得分:1)
使用递归共享javascript解决方案来构建文件夹数组,从第一级文件夹开始,然后向下移动层次结构。该数组是通过递归循环所讨论文件的父ID组成的。
下面的摘录对gapi进行3个单独的查询:
代码遍历文件列表,然后创建一个文件夹名称数组。
const { google } = require('googleapis')
const gOAuth = require('./googleOAuth')
// resolve the promises for getting G files and folders
const getGFilePaths = async () => {
//update to use Promise.All()
let gRootFolder = await getGfiles().then(result => {return result[2][0]['parents'][0]})
let gFolders = await getGfiles().then(result => {return result[1]})
let gFiles = await getGfiles().then(result => {return result[0]})
// create the path files and create a new key with array of folder paths, returning an array of files with their folder paths
return pathFiles = gFiles
.filter((file) => {return file.hasOwnProperty('parents')})
.map((file) => ({...file, path: makePathArray(gFolders, file['parents'][0], gRootFolder)}))
}
// recursive function to build an array of the file paths top -> bottom
let makePathArray = (folders, fileParent, rootFolder) => {
if(fileParent === rootFolder){return []}
else {
let filteredFolders = folders.filter((f) => {return f.id === fileParent})
if(filteredFolders.length >= 1 && filteredFolders[0].hasOwnProperty('parents')) {
let path = makePathArray(folders, filteredFolders[0]['parents'][0])
path.push(filteredFolders[0]['name'])
return path
}
else {return []}
}
}
// get meta-data list of files from gDrive, with query parameters
const getGfiles = () => {
try {
let getRootFolder = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(name, parents)',
q: "'root' in parents and trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
let getFolders = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(id,name,parents), nextPageToken',
q: "trashed = false and mimeType = 'application/vnd.google-apps.folder'"})
let getFiles = getGdriveList({corpora: 'user', includeItemsFromAllDrives: false,
fields: 'files(id,name,parents, mimeType, fullFileExtension, webContentLink, exportLinks, modifiedTime), nextPageToken',
q: "trashed = false and mimeType != 'application/vnd.google-apps.folder'"})
return Promise.all([getFiles, getFolders, getRootFolder])
}
catch(error) {
return `Error in retriving a file reponse from Google Drive: ${error}`
}
}
// make call out gDrive to get meta-data files. Code adds all files in a single array which are returned in pages
const getGdriveList = async (params) => {
const gKeys = await gOAuth.get()
const drive = google.drive({version: 'v3', auth: gKeys})
let list = []
let nextPgToken
do {
let res = await drive.files.list(params)
list.push(...res.data.files)
nextPgToken = res.data.nextPageToken
params.pageToken = nextPgToken
}
while (nextPgToken)
return list
}
答案 3 :(得分:0)
以下方法效果很好,但需要额外调用API。
与任何电子邮件地址共享搜索的根文件夹(文件夹A)。 将此其他项目添加到您的查询中:“读者中的'sharedEmailAddress'” 这样会将结果限制为文件夹和子文件夹中的所有内容。
示例:与电子邮件地址共享文件夹A,然后使用此查询进行搜索。
“阅读器中的'sharedEmailAddress'和fullText包含'要搜索的文本'”