Google Drive API:列出没有父级的文件

时间:2012-12-24 20:43:53

标签: google-api google-drive-api google-api-client

我管理的Google域中的文件已陷入糟糕状态;根目录中有数千个文件。我想识别这些文件并将它们移动到“我的驱动器”下的文件夹中。

当我使用API​​列出其中一个孤立文件的父项时,结果是一个空数组。要确定文件是否是孤立的,我可以迭代我域中的所有文件,并请求每个文件的父项列表。如果列表为空,我知道该文件是孤立的。

但这非常缓慢。

是否有使用Drive API搜索没有父项的文件?

q参数的“parents”字段似乎对此没有用,因为它只能指定父列表包含一些ID。

更新

我正在尝试快速找到真正位于文档层次结构根目录的项目。也就是说,他们是“我的驱动器”的兄弟姐妹,而不是“我的驱动器”的孩子。

6 个答案:

答案 0 :(得分:4)

在Java中:

List<File> result = new ArrayList<File>();
Files.List request = drive.files().list();
request.setQ("'root'" + " in parents");

FileList files = null;
files = request.execute();

for (com.google.api.services.drive.model.File element : files.getItems()) {
    System.out.println(element.getTitle());
}

'root'是父文件夹,如果文件或文件夹位于根目录

答案 1 :(得分:1)

蛮横,但很简单,有效..

    do {
        try {
            FileList files = request.execute();

            for (File f : files.getItems()) {
                if (f.getParents().size() == 0) {
                        System.out.println("Orphan found:\t" + f.getTitle());

                orphans.add(f);
                }
            }

            request.setPageToken(files.getNextPageToken());
        } catch (IOException e) {
            System.out.println("An error occurred: " + e);
            request.setPageToken(null);
        }
    } while (request.getPageToken() != null
            && request.getPageToken().length() > 0);

答案 2 :(得分:1)

尝试在查询中使用此功能:

'root' in parents 

答案 3 :(得分:1)

documentation建议使用以下查询:is:unorganized owner:me

答案 4 :(得分:0)

前提是:

  • 列出所有文件。
  • 如果文件没有“父母”字段,则表示它是孤立文件。
  • 因此,脚本将其删除。

开始之前,您需要:

准备好复制粘贴演示

from __future__ import print_function
import pickle
import os.path
from googleapiclient.discovery import build
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request

# If modifying these scopes, delete the file token.pickle.
SCOPES = ['https://www.googleapis.com/auth/drive']

def callback(request_id, response, exception):
    if exception:
        print("Exception:", exception)

def main():
    """
   Description:
   Shows basic usage of the Drive v3 API to delete orphan files.
   """

    """ --- CHECK CREDENTIALS --- """
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
                'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

    """ --- OPEN CONNECTION --- """
    service = build('drive', 'v3', credentials=creds)

    page_token = ""
    files = None
    orphans = []
    page_size = 100
    batch_counter = 0

    print("LISTING ORPHAN FILES")
    print("-----------------------------")
    while (True):
        # List
        r = service.files().list(pageToken=page_token,
                                 pageSize=page_size,
                                 fields="nextPageToken, files"
                                 ).execute()
        page_token = r.get('nextPageToken')
        files = r.get('files', [])

        # Filter orphans
        # NOTE: (If the file has no 'parents' field, it means it's orphan)
        for file in files:
            try:
                if file['parents']:
                    print("File with a parent found.")
            except Exception as e:
                print("Orphan file found.")
                orphans.append(file['id'])

        # Exit condition
        if page_token is None:
            break

    print("DELETING ORPHAN FILES")
    print("-----------------------------")
    batch_size = min(len(orphans), 100)
    while(len(orphans) > 0):
        batch = service.new_batch_http_request(callback=callback)
        for i in range(batch_size):
            print("File with id {0} queued for deletion.".format(orphans[0]))
            batch.add(service.files().delete(fileId=orphans[0]))
            del orphans[0]
        batch.execute()
        batch_counter += 1
        print("BATCH {0} DELETED - {1} FILES DELETED".format(batch_counter,
                                                             batch_size))


if __name__ == '__main__':
    main()

此方法不会删除根目录中的文件,因为它们具有“ parents”字段的“ root”值。如果未列出您所有的孤立文件,则表示它们已被Google自动删除。此过程最多可能需要24小时。

答案 5 :(得分:0)

Adreian Lopez,感谢您的脚本。这确实节省了我很多手工工作。以下是我执行您的脚本所遵循的步骤:

  1. 创建了一个文件夹c:\temp\pythonscript\ folder

  2. 使用https://console.cloud.google.com/apis/credentials创建OAuth 2.0客户端ID,并将凭据文件下载到c:\temp\pythonscript\ folder

  3. 将以上client_secret_#######-#############.apps.googleusercontent.com.json重命名为credentials.json

  4. 复制了Adreian Lopez的python脚本并将其保存为c:\temp\pythonscript\deleteGoogleDriveOrphanFiles.py

  5. 在Windows 10上转到“ Microsoft Store”并安装Python 3.8

  6. 打开命令提示符,然后输入:cd c:\temp\pythonscript\

  7. 运行pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

  8. 运行python deleteGoogleDriveOrphanFiles.py,然后按照屏幕上的步骤创建c:\temp\pythonscript\token.pickle文件并开始删除孤立文件。此步骤可能需要一段时间。

  9. 验证https://one.google.com/u/1/storage

  10. 根据需要重新运行步骤8。