Question

我想用Python 3获取docx文档的一部分（例如，所有内容的10％）。我怎么能这样做？感谢。

Answer 1

我会尝试一下这个：

from math import floor

def docx(file, percent):
  text = []
  lines = sum(1 for line in open(file))
  #print("File has {0} lines".format(lines))
  no = floor((lines * percent / 100))
  #print('Rounded to ', no)
  limit = 0
  with open(file) as f:
    for l in f:
      text.append(l)
      limit += 1
      if limit == no:
        break
  return text

要测试它，请尝试：

print(docx('example.docx', 10))

Answer 2

在python中与.docx文件交互的好方法是docx2txt module。

如果安装了pip，可以打开终端并运行：

pip install docx2txt

拥有docx模块后，您可以运行：

import docx2txt

然后，您可以返回文档中的文本，并仅过滤所需的部分。 filename.docx 的内容作为字符串存储在变量文本中。

text = docx2txt.process("filename.docx")
print(text)

现在可以使用一些基本的内置函数来操作该字符串。下面的代码片段打印文本的结果，使用 len（）函数返回长度，并通过创建子字符串将字符串切成约10％。

len(text)
print(len(text))  # returns 1000 for my sample document

text = text[1:100]
print(text)  # returns 10% of the string

我的完整代码如下所示。我希望这是有帮助的！

import docx2txt

text = docx2txt.process("/home/jared/test.docx")
print(text)

len(text)
print(len(text))  # returns 1000 for my sample document

text = text[1:100]
print(text)  # returns 10% of the string

在Python中，我如何获得docx文档的一部分？

2 个答案:

在python中与.docx文件交互的好方法是docx2txt module。