Python Google Drive API - 列出整个驱动器文件树

时间:2014-02-28 10:25:57

标签: python google-api google-drive-api

我正在构建一个使用Google驱动器API的python应用程序,因此开发很好,但是检查整个Google驱动器文件树时遇到问题,我需要将其用于两个目的:

  1. 检查路径是否存在,如果我想在root / folder1 / folder2下上传test.txt,我想检查文件是否已经存在,并在案例中更新
  2. 构建一个可视化文件浏览器,现在我知道谷歌提供了自己的(我现在不记得名字,但我知道存在)但我想将文件浏览器限制为特定的文件夹。
  3. 现在我有一个函数可以获取Gdrive的根目录,我可以通过递归调用一个函数来构建三个函数,该函数列出了我单个文件夹的内容,但它非常慢并且可能会有数千个请求谷歌这是不可接受的。

    这里是获取root的函数:

    def drive_get_root():
    """Retrieve a root list of File resources.
    Returns:
    List of dictionaries.
    """
    
    #build the service, the driveHelper module will take care of authentication and credential storage
    drive_service = build('drive', 'v2', driveHelper.buildHttp())
    # the result will be a list
    result = []
    page_token = None
    while True:
        try:
            param = {}
            if page_token:
                param['pageToken'] = page_token
            files = drive_service.files().list(**param).execute()
            #add the files in the list
            result.extend(files['items'])
            page_token = files.get('nextPageToken')
            if not page_token:
                break
        except errors.HttpError, _error:
            print 'An error occurred: %s' % _error
        break
    return result
    

    这里是从文件夹中获取文件的那个

    def drive_files_in_folder(folder_id):
    """Print files belonging to a folder.
    
    Args:
    folder_id: ID of the folder to get files from.
    """
    #build the service, the driveHelper module will take care of authentication and credential storage
    drive_service = build('drive', 'v2', driveHelper.buildHttp())
    # the result will be a list
    result = []
    #code from google, is working so I didn't touch it
    page_token = None
    while True:
        try:
            param = {}
    
            if page_token:
                param['pageToken'] = page_token
    
            children = drive_service.children().list(folderId=folder_id, **param).execute()
    
            for child in children.get('items', []):
                result.append(drive_get_file(child['id']))
    
            page_token = children.get('nextPageToken')
            if not page_token:
                break
        except errors.HttpError, _error:
            print 'An error occurred: %s' % _error
            break
    return result
    

    现在检查文件是否存在我正在使用它:

    def drive_path_exist(file_path, list = False):
    """
    This is a recursive function to che check if the given path exist
    """
    
    #if the list param is empty set the list as the root of Gdrive
    if list == False:
        list = drive_get_root()
    
    #split the string to get the first item and check if is in the root
    file_path = string.split(file_path, "/")
    
    #if there is only one element in the filepath we are at the actual filename
    #so if is in this folder we can return it
    if len(file_path) == 1:
        exist = False
        for elem in list:
            if elem["title"] == file_path[0]:
                #set exist = to the elem because the elem is a dictionary with all the file info
                exist = elem
    
        return exist
    #if we are not at the last element we have to keep searching
    else:
        exist = False
        for elem in list:
            #check if the current item is in the folder
            if elem["title"] == file_path[0]:
                exist = True
                folder_id = elem["id"]
                #delete the first element and keep searching
                file_path.pop(0)
    
        if exist:
            #recursive call, we have to rejoin the filpath as string an passing as list the list
            #from the drive_file_exist function
            return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))
    

    任何想法如何解决我的问题?我在这里看到了一些关于溢出的讨论,在一些答案中,人们写道,这是可能的,但当然没有说怎么做!

    由于

4 个答案:

答案 0 :(得分:8)

不要再将Drive视为树状结构了。事实并非如此。 “文件夹”只是标签,例如。一个文件可以有多个父母。

为了在您的应用中构建树的表示,您需要这样做...

  1. 运行云端硬盘列表查询以检索所有文件夹
  2. 迭代结果数组并检查parents属性以构建内存中层次结构
  3. 运行第二个驱动器列表查询以获取所有非文件夹(即文件)
  4. 对于返回的每个文件,将其放在内存中的树
  5. 如果您只想检查文件夹-B中是否存在文件A,则该方法取决于名称“folder-B”是否保证唯一。

    如果它是唯一的,只需对title ='file-A'执行一个FilesList查询,然后为每个父项执行一个Files Get,看看是否有任何一个被称为'folder-B'。

    如果'folder-C'和'folder-D'下都存在'folder-B',那么它就会更复杂,你需要从上面的步骤1和2构建内存中的层次结构。

    您没有说这些文件和文件夹是由您的应用创建的,还是由用户使用Google Drive Webapp创建的。如果您的应用是这些文件/文件夹的创建者,则可以使用一种技巧将搜索限制为单个根。说你有

    MyDrive/app_root/folder-C/folder-B/file-A
    

    您可以制作app_root

    的所有文件夹-C,文件夹B和文件A子项

    通过这种方式,您可以将所有查询限制为包含

    and 'app_root_id' in parents
    

答案 1 :(得分:3)

除非是非常小的树木,否则永远不会那样。您必须重新考虑云应用程序的整个算法(您已将其编写为您拥有该计算机的桌面应用程序),因为它会很容易超时。 您需要事先镜像树(任务队列和数据存储区),不仅要避免超时,还要避免驱动器速率限制,并以某种方式保持同步(注册推送等)。一点也不容易。我以前做过驱动树查看器。

答案 2 :(得分:1)

检查特定路径中是​​否存在文件的简单方法是: drive_service.files()。list(q ="' THE_ID_OF_SPECIFIC_PATH'在父母和标题='文件'")。执行()

要遍历所有文件夹和文件:

import sys, os
import socket

import googleDriveAccess

import logging
logging.basicConfig()

FOLDER_TYPE = 'application/vnd.google-apps.folder'

def getlist(ds, q, **kwargs):
  result = None
  npt = ''
  while not npt is None:
    if npt != '': kwargs['pageToken'] = npt
    entries = ds.files().list(q=q, **kwargs).execute()
    if result is None: result = entries
    else: result['items'] += entries['items']
    npt = entries.get('nextPageToken')
  return result

def uenc(u):
  if isinstance(u, unicode): return u.encode('utf-8')
  else: return u

def walk(ds, folderId, folderName, outf, depth):
  spc = ' ' * depth
  outf.write('%s+%s\n%s  %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
  q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for folder in entries['items']:
    walk(ds, folder['id'], folder['title'], outf, depth + 1)
  q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
  entries = getlist(ds, q, **{'maxResults': 200})
  for f in entries['items']:
    outf.write('%s -%s\n%s   %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))

def main(basedir):
  da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
  f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
  walk(da.drive_service, 'root', u'root', f, 0)
  f.close()

if __name__ == '__main__':
  logging.getLogger().setLevel(getattr(logging, 'INFO'))
  try:
    main(os.path.dirname(__file__))
  except (socket.gaierror, ), e:
    sys.stderr.write('socket.gaierror')

使用googleDriveAccess github.com/HatsuneMiku/googleDriveAccess

答案 3 :(得分:0)

我同意@pinoyyid-Google驱动器不是典型的树形结构。

但是,为了打印文件夹结构,我仍然会考虑使用树形可视化库(例如treelib)。

以下是用于递归打印Google驱动器文件系统的完整解决方案

from treelib import Node, Tree

from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive

gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)

### Helper functions ### 
def get_children(root_folder_id):
    str = "\'" + root_folder_id + "\'" + " in parents and trashed=false"
    file_list = drive.ListFile({'q': str}).GetList()
    return file_list

def get_folder_id(root_folder_id, root_folder_title):
    file_list = get_children(root_folder_id)
    for file in file_list:
        if(file['title'] == root_folder_title):
            return file['id']

def add_children_to_tree(tree, file_list, parent_id):
    for file in file_list:
        tree.create_node(file['title'], file['id'], parent=parent_id)
        print('parent: %s, title: %s, id: %s' % (parent_id, file['title'], file['id']))

### Recursion over all children ### 
def populate_tree_recursively(tree,parent_id):
    children = get_children(parent_id)
    add_children_to_tree(tree, children, parent_id)
    if(len(children) > 0):
        for child in children:
            populate_tree_recursively(tree, child['id'])


### Create tree and start populating from root ###
def main():
    root_folder_title = "your-root-folder"
    root_folder_id = get_folder_id("root", root_folder_title)

    tree = Tree()
    tree.create_node(root_folder_title, root_folder_id)
    populate_tree_recursively(tree, root_folder_id)
    tree.show()

if __name__ == "__main__":
    main()