Question

我正在尝试从我的Google文档中读取数据。因此，我现在正在使用python，并且已经实现了Google Docs AP I并正在使用python。我只是复制并粘贴了google提供的代码，并进行了一些修改，我成功地读取了 LINE BY LINE 的数据，但仅 TEXT ！现在，我正在尝试新的东西并插入了图像。这就是它的样子。

Image of my google doc content

Google Doc Link

非常简单的权利...它有一个项目符号点和包含图像和“ Hello”文本的子项目符号点。现在，当我读取数据（逐行读取）时，我尝试打印出API返回的内容，并再次返回包含dictionary的{{1}}。这就是它的样子。

dictionaries

如您所见，有3个字典，分别包含它们的{'startIndex': 1, 'endIndex': 41, 'paragraph': {'elements': [{'startIndex': 1, 'endIndex': 41, 'textRun': {'content': 'This is the Python Programming Language\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 18, 'unit': 'PT'}, 'indentStart': {'magnitude': 36, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'textStyle': {'underline': False}}}} {'startIndex': 41, 'endIndex': 43, 'paragraph': {'elements': [{'startIndex': 41, 'endIndex': 42, 'inlineObjectElement': {'inlineObjectId': 'kix.o4cuh6wash2n', 'textStyle': {}}}, {'startIndex': 42, 'endIndex': 43, 'textRun': {'content': '\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}} {'startIndex': 43, 'endIndex': 49, 'paragraph': {'elements': [{'startIndex': 43, 'endIndex': 49, 'textRun': {'content': 'Hello\n', 'textStyle': {}}}], 'paragraphStyle': {'namedStyleType': 'NORMAL_TEXT', 'direction': 'LEFT_TO_RIGHT', 'indentFirstLine': {'magnitude': 54, 'unit': 'PT'}, 'indentStart': {'magnitude': 72, 'unit': 'PT'}}, 'bullet': {'listId': 'kix.y7w314ij0ywy', 'nestingLevel': 1, 'textStyle': {'underline': False}}}}和key对。请注意，这三个用于文档中的每一行。您还可以看到，键value及其content是文档中的文本。

如果您查看嵌套字典，那就是这些字典：

value

现在，我注意到它为图像所在的行返回了{'content': 'This is the Python Programming Language\n', 'textStyle': {}} {'content': '\n', 'textStyle': {}} {'content': 'Hello\n', 'textStyle': {}}。另外，我一直在寻找至少它可能有一个\n，它的值将是图像的临时URL，但似乎没有。所以我的问题是否有办法以某种方式使用我正在使用的API读取此图像（也包括EXTRACT IT）？可能我只是缺少一些东西...有人可以帮我吗？任何其他替代解决方案将不胜感激！谢谢！

以下是Google提供的源代码，我对key函数进行了修改，以了解如何出于个人目的读取数据，但是在那里您可以看到它的工作方式API为每个行数据返回一个字典。我还注意到，API确实确实会逐行读取并返回read_strucutural_elements

dictionary

这里是def main(): """Shows basic usage of the Docs API. Prints the title of a sample document. """ creds = None # The file token.pickle stores the user's access and refresh tokens, and is # created automatically when the authorization flow completes for the first # time. if os.path.exists('token.pickle'): with open('token.pickle', 'rb') as token: creds = pickle.load(token) # If there are no (valid) credentials available, let the user log in. if not creds or not creds.valid: if creds and creds.expired and creds.refresh_token: creds.refresh(Request()) else: flow = InstalledAppFlow.from_client_secrets_file( 'credentials.json', SCOPES) creds = flow.run_local_server(port=0) # Save the credentials for the next run with open('token.pickle', 'wb') as token: pickle.dump(creds, token) service = build('docs', 'v1', credentials=creds) # Retrieve the documents contents from the Docs service. document = service.documents().get(documentId=DOCUMENT_ID).execute() #print('The title of the document is: {}'.format(document.get('title'))) data = read_strucutural_elements(document.get("body").get("content"))函数，我只是在那里打印出read_strucutural_elements参数中的元素，其中该参数逐行包含那些数据。

elements

非常感谢您！

Answer 1

查看字典输出，图像是具有特定ID的inlineObject。您应该能够使用其URL检索图像。要获取网址，请参阅相关问题：How to get the url to Google doc image

如何使用Python从Google文档中提取或读取图像

1 个答案: