如何在 google colab 中读取 .docx 文件?

时间:2021-04-06 14:08:33

标签: python python-3.x google-colaboratory google-docs file-read

我正在尝试将 docx 文件读入 google collab,因为我的装有 anaconda 的主计算机因维护而消失。我正在尝试使用 python-docx 模块,但据我所知,我不能在 google collab 中 pip install python-docx

'''

import docx

def getText(filename):
    doc = docx.Document(filename)
    fullText = []
    for para in doc.paragraphs:
        fullText.append(para.text)
    return '\n'.join(fullText)

docxString = getText("week_8_document1.docx")

'''

有什么想法吗?

1 个答案:

答案 0 :(得分:0)

尝试以下操作;希望它有效:

#Install python-docx
!pip install python-docx #<-- Yes you can directly install in Colab

#Import the tools
import docx
from google.colab import files

uploaded = files.upload() #<-- Select the file you want to upload
file_name = '[whatever your file is called here].docx' #<-- Change filename to your file
doc = docx.Document(file_name)

加载文档后,您可以按段落或表格等访问文本。祝老板好运