使用表格从PDF读取表格会提示后备字体警告

时间:2018-09-06 13:33:53

标签: python pdf web-scraping tabula

我正在尝试从PDF文件读取表格。当我尝试使用几个PDF文件以及带有表的手动创建的PDF时,它可以成功运行。

import tabula
df = tabula.read_pdf("test.pdf", encoding='utf-8', spreadsheet=True)
print df

Sep 06, 2018 7:00:34 PM org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB suggestKCMS
INFO: To get higher rendering speed on JDK8 or later,
Sep 06, 2018 7:00:34 PM org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB suggestKCMS
INFO:   use the option -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Sep 06, 2018 7:00:34 PM org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB suggestKCMS
INFO:   or call System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")
   Test  Karim
0     1      2

但是,当我尝试读取要从中提取表格的实际pdf文件时, 这个警告都没有给

Sep 06, 2018 7:01:46 PM org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB suggestKCMS
INFO: To get higher rendering speed on JDK8 or later,
Sep 06, 2018 7:01:46 PM org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB suggestKCMS
INFO:   use the option -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Sep 06, 2018 7:01:46 PM org.apache.pdfbox.pdmodel.graphics.color.PDDeviceRGB suggestKCMS
INFO:   or call System.setProperty("sun.java2d.cmm", "sun.java2d.cmm.kcms.KcmsServiceProvider")
Sep 06, 2018 7:01:47 PM org.apache.pdfbox.pdmodel.font.PDTrueTypeFont <init>
WARNING: Using fallback font 'LiberationSerif' for 'TimesNewRomanPSMT'
None

我使用https://github.com/chezou/tabula-py作为参考。

0 个答案:

没有答案