我正在构建一个使用Google驱动器API的python应用程序,因此开发很好,但是检查整个Google驱动器文件树时遇到问题,我需要将其用于两个目的:
现在我有一个函数可以获取Gdrive的根目录,我可以通过递归调用一个函数来构建三个函数,该函数列出了我单个文件夹的内容,但它非常慢并且可能会有数千个请求谷歌这是不可接受的。
这里是获取root的函数:
def drive_get_root():
"""Retrieve a root list of File resources.
Returns:
List of dictionaries.
"""
#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
files = drive_service.files().list(**param).execute()
#add the files in the list
result.extend(files['items'])
page_token = files.get('nextPageToken')
if not page_token:
break
except errors.HttpError, _error:
print 'An error occurred: %s' % _error
break
return result
这里是从文件夹中获取文件的那个
def drive_files_in_folder(folder_id):
"""Print files belonging to a folder.
Args:
folder_id: ID of the folder to get files from.
"""
#build the service, the driveHelper module will take care of authentication and credential storage
drive_service = build('drive', 'v2', driveHelper.buildHttp())
# the result will be a list
result = []
#code from google, is working so I didn't touch it
page_token = None
while True:
try:
param = {}
if page_token:
param['pageToken'] = page_token
children = drive_service.children().list(folderId=folder_id, **param).execute()
for child in children.get('items', []):
result.append(drive_get_file(child['id']))
page_token = children.get('nextPageToken')
if not page_token:
break
except errors.HttpError, _error:
print 'An error occurred: %s' % _error
break
return result
现在检查文件是否存在我正在使用它:
def drive_path_exist(file_path, list = False):
"""
This is a recursive function to che check if the given path exist
"""
#if the list param is empty set the list as the root of Gdrive
if list == False:
list = drive_get_root()
#split the string to get the first item and check if is in the root
file_path = string.split(file_path, "/")
#if there is only one element in the filepath we are at the actual filename
#so if is in this folder we can return it
if len(file_path) == 1:
exist = False
for elem in list:
if elem["title"] == file_path[0]:
#set exist = to the elem because the elem is a dictionary with all the file info
exist = elem
return exist
#if we are not at the last element we have to keep searching
else:
exist = False
for elem in list:
#check if the current item is in the folder
if elem["title"] == file_path[0]:
exist = True
folder_id = elem["id"]
#delete the first element and keep searching
file_path.pop(0)
if exist:
#recursive call, we have to rejoin the filpath as string an passing as list the list
#from the drive_file_exist function
return drive_path_exist("/".join(file_path), drive_files_in_folder(folder_id))
任何想法如何解决我的问题?我在这里看到了一些关于溢出的讨论,在一些答案中,人们写道,这是可能的,但当然没有说怎么做!
由于
答案 0 :(得分:8)
不要再将Drive视为树状结构了。事实并非如此。 “文件夹”只是标签,例如。一个文件可以有多个父母。
为了在您的应用中构建树的表示,您需要这样做...
如果您只想检查文件夹-B中是否存在文件A,则该方法取决于名称“folder-B”是否保证唯一。
如果它是唯一的,只需对title ='file-A'执行一个FilesList查询,然后为每个父项执行一个Files Get,看看是否有任何一个被称为'folder-B'。
如果'folder-C'和'folder-D'下都存在'folder-B',那么它就会更复杂,你需要从上面的步骤1和2构建内存中的层次结构。
您没有说这些文件和文件夹是由您的应用创建的,还是由用户使用Google Drive Webapp创建的。如果您的应用是这些文件/文件夹的创建者,则可以使用一种技巧将搜索限制为单个根。说你有
MyDrive/app_root/folder-C/folder-B/file-A
您可以制作app_root
的所有文件夹-C,文件夹B和文件A子项通过这种方式,您可以将所有查询限制为包含
and 'app_root_id' in parents
答案 1 :(得分:3)
除非是非常小的树木,否则永远不会那样。您必须重新考虑云应用程序的整个算法(您已将其编写为您拥有该计算机的桌面应用程序),因为它会很容易超时。 您需要事先镜像树(任务队列和数据存储区),不仅要避免超时,还要避免驱动器速率限制,并以某种方式保持同步(注册推送等)。一点也不容易。我以前做过驱动树查看器。
答案 2 :(得分:1)
检查特定路径中是否存在文件的简单方法是: drive_service.files()。list(q ="' THE_ID_OF_SPECIFIC_PATH'在父母和标题='文件'")。执行()
要遍历所有文件夹和文件:
import sys, os
import socket
import googleDriveAccess
import logging
logging.basicConfig()
FOLDER_TYPE = 'application/vnd.google-apps.folder'
def getlist(ds, q, **kwargs):
result = None
npt = ''
while not npt is None:
if npt != '': kwargs['pageToken'] = npt
entries = ds.files().list(q=q, **kwargs).execute()
if result is None: result = entries
else: result['items'] += entries['items']
npt = entries.get('nextPageToken')
return result
def uenc(u):
if isinstance(u, unicode): return u.encode('utf-8')
else: return u
def walk(ds, folderId, folderName, outf, depth):
spc = ' ' * depth
outf.write('%s+%s\n%s %s\n' % (spc, uenc(folderId), spc, uenc(folderName)))
q = "'%s' in parents and mimeType='%s'" % (folderId, FOLDER_TYPE)
entries = getlist(ds, q, **{'maxResults': 200})
for folder in entries['items']:
walk(ds, folder['id'], folder['title'], outf, depth + 1)
q = "'%s' in parents and mimeType!='%s'" % (folderId, FOLDER_TYPE)
entries = getlist(ds, q, **{'maxResults': 200})
for f in entries['items']:
outf.write('%s -%s\n%s %s\n' % (spc, uenc(f['id']), spc, uenc(f['title'])))
def main(basedir):
da = googleDriveAccess.DAClient(basedir) # clientId=None, script=False
f = open(os.path.join(basedir, 'hierarchy.txt'), 'wb')
walk(da.drive_service, 'root', u'root', f, 0)
f.close()
if __name__ == '__main__':
logging.getLogger().setLevel(getattr(logging, 'INFO'))
try:
main(os.path.dirname(__file__))
except (socket.gaierror, ), e:
sys.stderr.write('socket.gaierror')
使用googleDriveAccess github.com/HatsuneMiku/googleDriveAccess
答案 3 :(得分:0)
我同意@pinoyyid-Google驱动器不是典型的树形结构。
但是,为了打印文件夹结构,我仍然会考虑使用树形可视化库(例如treelib)。
以下是用于递归打印Google驱动器文件系统的完整解决方案。
from treelib import Node, Tree
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
gauth = GoogleAuth()
gauth.LocalWebserverAuth()
drive = GoogleDrive(gauth)
### Helper functions ###
def get_children(root_folder_id):
str = "\'" + root_folder_id + "\'" + " in parents and trashed=false"
file_list = drive.ListFile({'q': str}).GetList()
return file_list
def get_folder_id(root_folder_id, root_folder_title):
file_list = get_children(root_folder_id)
for file in file_list:
if(file['title'] == root_folder_title):
return file['id']
def add_children_to_tree(tree, file_list, parent_id):
for file in file_list:
tree.create_node(file['title'], file['id'], parent=parent_id)
print('parent: %s, title: %s, id: %s' % (parent_id, file['title'], file['id']))
### Recursion over all children ###
def populate_tree_recursively(tree,parent_id):
children = get_children(parent_id)
add_children_to_tree(tree, children, parent_id)
if(len(children) > 0):
for child in children:
populate_tree_recursively(tree, child['id'])
### Create tree and start populating from root ###
def main():
root_folder_title = "your-root-folder"
root_folder_id = get_folder_id("root", root_folder_title)
tree = Tree()
tree.create_node(root_folder_title, root_folder_id)
populate_tree_recursively(tree, root_folder_id)
tree.show()
if __name__ == "__main__":
main()