我有一个pdf,它重复以下多次:
31-10-2018
NATIONAL
Initial Hearing
Imputed: Maynor Steven Sevilla Flores
Crime: murder
Relation of facts: murder at 10 am in the neighborhood cox 20...…
NOTE: xxxxxxxx...
NOTE2:xxxxxxxx...
DATA: xxxxxxx...
01-11-2018
NATIONAL
Initial Hearing
Imputed: James Graden
Crime: murder
Relation of facts: murder at 11 am in the neighborhood bit 45...…
.
.
.
我想实现一个python代码:
import PyPDF2
import re
PATH_DOWNLOAD_PDF = /home/Dev/Freelance/Webscrapping/test/file.pdf'
pdf_file = open(PATH_DOWNLOAD_PDF, 'rb')
read_pdf = PyPDF2.PdfFileReader(pdf_file)
#.
#.
#.
我需要使用正则python表达式读取pdf以获得结果:
预期结果:列表字典PYTHON:
[
{
“Date” : “31-10-2018”,
“Judge” : “NATIONAL”,
“Initial Hearing” :
{
“imputed” : “Maynor Steven Sevilla Flores”
“Crime” : murder
“Relation of facts” “murder at 10 am in the neighborhood cox 20...”
}
},
{
“Date” : “01-11-2018”,
“Judge” : “NATIONAL”,
“Initial Hearing” :
{
“imputed” : “ames Graden”
“Crime” : murder
“Relation of facts” “murder at 11 am in the neighborhood bit 45...…”
}
}
]
我有点编程,请帮忙