应用错误收集

使用pypdf2解析Pdf

时间：2016-02-15 04:43:58

标签： python pdf pypdf pdf-parsing

在使用pypdf2解析pdf文件时，它会在换行符中读取像mm-dd-yy这样的hifenated单词：

毫米

-

DD

-

YY

这是我的代码：

import PyPDF2    
def getPDFContent(path):
    pdf = PyPDF2.PdfFileReader(file(path, "rb"))    
    content = ""
    content += pdf.getPage(0).extractText() + "\n"    
    return content

如何克服这一点并将它们打印在同一行？

0 个答案:

没有答案