PDF到Python中的Word Doc

时间:2015-10-22 00:29:03

标签: python-2.7 pdf ms-word

我已经阅读了其他堆栈溢出问题,但它没有回答我的问题,所以投票结束了。它的版本是2.7。

我想要做的就是使用python将PDF转换为Word文档。至少转换为文本,以便我可以复制并粘贴到word doc。

这是我到目前为止的代码。所有打印的都是女性性别符号。

我的代码错了吗?我接近这个错吗?有些PDF不能与PDFMiner一起使用吗?除了使用PyPDF2或PDFMiner之外,您是否知道有任何其他替代方法可以实现将PDF转换为Word的目标?

from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams
from pdfminer.pdfpage import PDFPage
from cStringIO import StringIO

def convert_pdf_to_txt(path):
    rsrcmgr = PDFResourceManager()
    retstr = StringIO()
    codec = 'utf-8'
    laparams = LAParams()
    device = TextConverter(rsrcmgr, retstr, codec=codec, laparams=laparams)
    fp = file('Bottom Dec.pdf', 'rb')
    interpreter = PDFPageInterpreter(rsrcmgr, device)
    password = ""
    maxpages = 0
    caching = True
    pagenos=set()

    for page in PDFPage.get_pages(fp, pagenos, maxpages=maxpages, password=password,caching=caching, check_extractable=True):
        interpreter.process_page(page)

    text = retstr.getvalue()

    fp.close()
    device.close()
    retstr.close()
    return text
print convert_pdf_to_txt(1)

1 个答案:

答案 0 :(得分:0)

另一种替代解决方案是Aspose.Words Cloud SDK for Python,您可以从pip安装它以将PDF转换为DOC。

import asposewordscloud
import asposewordscloud.models.requests
api_client = asposewordscloud.ApiClient()
api_client.configuration.host = 'https://api.aspose.cloud'
# Get AppKey and AppSID from https://dashboard.aspose.cloud/
api_client.configuration.api_key['api_key'] = 'xxxxxxxxxxxxxxxxxxxxx' # Put your appKey here
api_client.configuration.api_key['app_sid'] = 'xxxxxxxxx-xxxx-xxxxx-xxxx-xxxxxxxxxx' # Put your appSid here

words_api = asposewordscloud.WordsApi(api_client)
filename = '02_pages.pdf'
remote_name = 'TestPostDocumentSaveAs.pdf'
dest_name = 'TestPostDocumentSaveAs.doc'
#upload PDF file to storage
request_stoarge = asposewordscloud.models.requests.UploadFileRequest(filename,remote_name)
response = words_api.upload_file(request_stoarge)
#Convert PDF to DOC and save to storage
save_options = asposewordscloud.SaveOptionsData(save_format='doc', file_name=dest_name)
request = asposewordscloud.models.requests.SaveAsRequest(remote_name, save_options)
result = words_api.save_as(request)
print("Result {}".format(result))

我是Aspose的开发人员布道者。