Question

我正在构建一个使用Google驱动器API的python应用程序，因此开发很好，但是检查整个Google驱动器文件树时遇到问题，我需要将其用于两个目的：

检查路径是否存在，如果我想在root / folder1 / folder2下上传test.txt，我想检查文件是否已经存在，并在案例中更新
构建一个可视化文件浏览器，现在我知道谷歌提供了自己的（我现在不记得名字，但我知道存在）但我想将文件浏览器限制为特定的文件夹。

现在我有一个函数可以获取Gdrive的根目录，我可以通过递归调用一个函数来构建三个函数，该函数列出了我单个文件夹的内容，但它非常慢并且可能会有数千个请求谷歌这是不可接受的。

这里是获取root的函数：

def drive_get_root():
"""Retrieve a root list of File resources.
Returns:
List of dictionaries.
"""

#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
page_token = None
while True:
    try:
        param = {}
        if page_token:
            param['pageToken'] = page_token
        files = drive_service.files().list(**param).execute()
        #add the files in the list
        result.extend(files['items'])
        page_token = files.get('nextPageToken')
        if not page_token:
            break
    except errors.HttpError, _error:
        print 'An error occurred: %s' % _error
    break
return result

这里是从文件夹中获取文件的那个

def drive_files_in_folder(folder_id):
"""Print files belonging to a folder.

Args:
folder_id: ID of the folder to get files from.
"""
#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
#code from google, is working so I didn't touch it
page_token = None
while True:
    try:
        param = {}

        if page_token:
            param['pageToken'] = page_token

        children = drive_service.children().list(folderId=folder_id, **param).execute()

        for child in children.get('items', []):
            result.append(drive_get_file(child['id']))

        page_token = children.get('nextPageToken')
        if not page_token:
            break
    except errors.HttpError, _error:
        print 'An error occurred: %s' % _error
        break
return result

现在检查文件是否存在我正在使用它：

def drive_path_exist(file_path, list = False):
"""
This is a recursive function to che check if the given path exist
"""

#if the list param is empty set the list as the root of Gdrive
if list == False:
    list = drive_get_root()

#split the string to get the first item and check if is in the root
file_path = string.split(file_path, "/")

#if there is only one element in the filepath we are at the actual filename
#so if is in this folder we can return it
if len(file_path) == 1:
    exist = False
    for elem in list:
        if elem["title"] == file_path[0]:
            #set exist = to the elem because the elem is a dictionary with all the file info
            exist = elem

    return exist
#if we are not at the last element we have to keep searching
else:
    exist = False
    for elem in list:
        #check if the current item is in the folder
        if elem["title"] == file_path[0]:
            exist = True
            folder_id = elem["id"]
            #delete the first element and keep searching
            file_path.pop(0)

    if exist:
        #recursive call, we have to rejoin the filpath as string an passing as list the list
        #from the drive_file_exist function
        return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))

任何想法如何解决我的问题？我在这里看到了一些关于溢出的讨论，在一些答案中，人们写道，这是可能的，但当然没有说怎么做！

由于

Answer 1

不要再将Drive视为树状结构了。事实并非如此。 “文件夹”只是标签，例如。一个文件可以有多个父母。

为了在您的应用中构建树的表示，您需要这样做...

运行云端硬盘列表查询以检索所有文件夹
迭代结果数组并检查parents属性以构建内存中层次结构
运行第二个驱动器列表查询以获取所有非文件夹（即文件）
对于返回的每个文件，将其放在内存中的树

如果您只想检查文件夹-B中是否存在文件A，则该方法取决于名称“folder-B”是否保证唯一。

如果它是唯一的，只需对title ='file-A'执行一个FilesList查询，然后为每个父项执行一个Files Get，看看是否有任何一个被称为'folder-B'。

如果'folder-C'和'folder-D'下都存在'folder-B'，那么它就会更复杂，你需要从上面的步骤1和2构建内存中的层次结构。

您没有说这些文件和文件夹是由您的应用创建的，还是由用户使用Google Drive Webapp创建的。如果您的应用是这些文件/文件夹的创建者，则可以使用一种技巧将搜索限制为单个根。说你有

MyDrive/app_root/folder-C/folder-B/file-A

您可以制作app_root

通过这种方式，您可以将所有查询限制为包含

and 'app_root_id' in parents

Answer 2

除非是非常小的树木，否则永远不会那样。您必须重新考虑云应用程序的整个算法（您已将其编写为您拥有该计算机的桌面应用程序），因为它会很容易超时。您需要事先镜像树（任务队列和数据存储区），不仅要避免超时，还要避免驱动器速率限制，并以某种方式保持同步（注册推送等）。一点也不容易。我以前做过驱动树查看器。

Answer 3

检查特定路径中是否存在文件的简单方法是： drive_service.files（）。list（q =＆＃34;＆＃39; THE_ID_OF_SPECIFIC_PATH＆＃39;在父母和标题=＆＃39;文件＆＃39;＆＃34;）。执行（）

要遍历所有文件夹和文件：

import sys, os
import socket

import googleDriveAccess

import logging
logging.basicConfig()

FOLDER_TYPE = 'application/vnd.google-apps.folder'

def getlist(ds, q, **kwargs):
  result = None
  npt = ''
  while not npt is None:
    if npt != '': kwargs['pageToken'] = npt
    entries = ds.files().list(q=q, **kwargs).execute()
    if result is None: result = entries
    else: result['items'] += entries['items']
    npt = entries.get('nextPageToken')
  return result

def uenc(u):
  if isinstance(u, unicode): return u.encode('utf-8')
  else: return u

def walk(ds, folderId, folderName, outf, depth):
  spc = ' ' * depth
  outf.write('%s+%s\n%s  %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
  q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for folder in entries['items']:
    walk(ds, folder['id'], folder['title'], outf, depth + 1)
  q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for f in entries['items']:
    outf.write('%s -%s\n%s   %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))

def main(basedir):
  da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
  f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
  walk(da.drive_service, 'root', u'root', f, 0)
  f.close()

if __name__ == '__main__':
  logging.getLogger().setLevel(getattr(logging, 'INFO'))
  try:
    main(os.path.dirname(__file__))
  except (socket.gaierror, ), e:
    sys.stderr.write('socket.gaierror')

使用googleDriveAccess github.com/HatsuneMiku/googleDriveAccess

Answer 4

我同意@pinoyyid-Google驱动器不是典型的树形结构。

但是，为了打印文件夹结构，我仍然会考虑使用树形可视化库（例如treelib）。

以下是用于递归打印Google驱动器文件系统的完整解决方案。

from treelib import Node, Tree

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive

gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)

### Helper functions ### 
def get_children(root_folder_id):
    str = "\'" + root_folder_id + "\'" + " in parents and trashed=false"
    file_list = drive.ListFile({'q': str}).GetList()
    return file_list

def get_folder_id(root_folder_id, root_folder_title):
    file_list = get_children(root_folder_id)
    for file in file_list:
        if(file['title'] == root_folder_title):
            return file['id']

def add_children_to_tree(tree, file_list, parent_id):
    for file in file_list:
        tree.create_node(file['title'], file['id'], parent=parent_id)
        print('parent: %s, title: %s, id: %s' % (parent_id, file['title'], file['id']))

### Recursion over all children ### 
def populate_tree_recursively(tree,parent_id):
    children = get_children(parent_id)
    add_children_to_tree(tree, children, parent_id)
    if(len(children) > 0):
        for child in children:
            populate_tree_recursively(tree, child['id'])


### Create tree and start populating from root ###
def main():
    root_folder_title = "your-root-folder"
    root_folder_id = get_folder_id("root", root_folder_title)

    tree = Tree()
    tree.create_node(root_folder_title, root_folder_id)
    populate_tree_recursively(tree, root_folder_id)
    tree.show()

if __name__ == "__main__":
    main()

Python Google Drive API - 列出整个驱动器文件树

4 个答案: