How do i get docx2txt to process all docx files in directory?

时间:2019-04-17 00:56:00

标签: python docx

I'm using the docx2txt module in python2.7 and I'm trying to get it to process all of the docx files in one directory. Currently I have doc2txt.process("THE NAME OF THE DOCUMENT.docx")

I want to process all docx files in the current working directory but I'm not sure how to do that

I have inserted my code below. It prints out the name of the file and the text in the docx file.

import os
import docx2txt

os.chdir('c:/users/Says/desktop')

files = []

path = 'c:/users/Says/desktop'



my_text = docx2txt.process("test.docx")

for files in os.listdir(path):
    if files.endswith('docx'):
        print(files)
        print(my_text)

1 个答案:

答案 0 :(得分:1)

You're half way there.

Create a list to store all the files that you find:

files = []
for file in os.listdir(path):
    if file.endswith('.docx'):
        files.append(file)

Then you can use a for statement to loop through all the files and open them one at a time:

for i in range(len(files)):
    text = docx2txt.process(files[i])
    # Do something with the text.

If you want to change your code to allow the use of the current working directory you can set your path to:

path = os.getcwd()