我在使用包 invoice2data
时遇到了无法解决的错误。
当我为发票设置此模板时:
issuer: My Template
keywords:
- www.webok.com
- 123 4567 89
fields:
amount: TOTAL\s+.(\d+\.\d+)
date: Date:\s+(\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{1,2})
invoice_number: Reference:\s(\w+)
operator: Operators:\s(\w+)
options:
currency: USD
date_formats:
- '%d/%m/%Y %G:%i'
languages:
- en
decimal_separator: '.'
lines:
start: Your Reference:+\s+\w+\n_+
end: \s+_+\n+\s+TOTAL\s+.(\d+\.\d+)
line: (?P<description>.+)\s+\((?P<quantity>.+)\)\s+.(?P<price>\d+\.\d+)
我没有任何错误,但是,如果我在 fields
的末尾添加它,我会收到以下 unhashable type
错误:
fields:
...
friendly_name:
parser: static
value: Amazon
不可散列的类型错误:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/invoice2data", line 11, in <module>
load_entry_point('invoice2data==0.3.5', 'console_scripts', 'invoice2data')()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 201, in main
res = extract_data(f.name, templates=templates, input_module=input_module)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 93, in extract_data
return t.extract(optimized_str)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/extract/invoice_template.py", line 174, in extract
res_find = re.findall(v, optimized_str)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 181, in findall
return _compile(pattern, flags).findall(string)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 237, in _compile
p, loc = _cache[cachekey]
TypeError: unhashable type: 'OrderedDict'
有人可以帮我吗?我认为这是软件尝试解压选项但不知道如何解决时的错误。
这是我在调试模式下的内容:
...
DEBUG:invoice2data.extract.invoice_template:field=vat_lines | regexp=OrderedDict([('parser', 'lines'), ('start', 'PAYMENT TYPE\\s+AMOUNT\\s+_+'), ('end', '\\s_+\\s+PLEASE KEEP THIS RECEIPT SAFE'), ('line', '(?P<type_paiment>\\w+)\\s+.(?P<montant>\\d+\\.\\d+)'), ('types', OrderedDict([('montant', 'float')]))])
感谢您的帮助!
script.py
import pprint
from invoice2data import extract_data
from invoice2data.extract.loader import read_templates
templates = read_templates('templates/')
result = extract_data("invoice.pdf", templates=templates)
pprint.pprint(result)
并将其作为模板(在模板文件夹中)
templates/fr/fr.error.yml
issuer: My Template
keywords:
- www.webok.com
- 123 4567 89
fields:
amount: TOTAL\s+.(\d+\.\d+)
date: Date:\s+(\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{1,2})
invoice_number: Reference:\s(\w+)
operator: Operators:\s(\w+)
vat_lines:
parser: lines
start: PAYMENT TYPE\s+AMOUNT\s+_+
end: \s_+\s+PLEASE KEEP THIS RECEIPT SAFE
line: (?P<type_paiment>\w+)\s+.(?P<montant>\d+\.\d+)
types:
montant: float
options:
currency: USD
date_formats:
- '%d/%m/%Y %G:%i'
languages:
- en
decimal_separator: '.'
lines:
start: Your Reference:+\s+\w+\n_+
end: \s+_+\n+\s+TOTAL\s+.(\d+\.\d+)
line: (?P<description>.+)\s+\((?P<quantity>.+)\)\s+.(?P<price>\d+\.\d+)
invoice.txt
(需要转成.pdf)
__________________________________________
My Template
__________________________________________
Date: 03/12/2020 11:23
Operators: Me
Reference: ABC123
__________________________________________
First product (1) €12.93
Second product (3) €22.93
Third product (1) €12.95
Last product (1) €12.93
_________
TOTAL €61.74
VAT/CODE NET VAT
_____________________________
20% S €93.27 €18.66
PAYMENT TYPE AMOUNT
_____________________________
CASH €61.74
CARD €0.00
CHANGE GIVEN €3.07
__________________________________________
PLEASE KEEP THIS RECEIPT SAFE
FOR GUARANTEE PURPOSES
__________________________________________
Thanks for shopping with us!
VAT Number : 123 4567 89
www.webok.com
最后,invoice2data
调试输出的完整输出:
DEBUG:invoice2data.main:START pdftotext result ===========================
DEBUG:invoice2data.main:__________________________________________
My Template
__________________________________________
Date: 03/12/2020 11:23
Operators: Me
Reference: ABC123
__________________________________________
First product (1) €12.93
Second product (3) €22.93
Third product (1) €12.95
Last product (1) €12.93
_________
TOTAL €61.74
VAT/CODE NET VAT
_____________________________
20% S €93.27 €18.66
PAYMENT TYPE AMOUNT
_____________________________
CASH €61.74
CARD €0.00
CHANGE GIVEN €3.07
__________________________________________
PLEASE KEEP THIS RECEIPT SAFE
FOR GUARANTEE PURPOSES
__________________________________________
Thanks for shopping with us!
VAT Number : 123 4567 89
www.webok.com
DEBUG:invoice2data.main:END pdftotext result =============================
DEBUG:invoice2data.main:Testing 254 template files
DEBUG:invoice2data.extract.invoice_template:Matched template fr.error.yml
DEBUG:invoice2data.extract.invoice_template:START optimized_str ========================
DEBUG:invoice2data.extract.invoice_template:__________________________________________
My Template
__________________________________________
Date: 03/12/2020 11:23
Operators: Me
Reference: ABC123
__________________________________________
First product (1) €12.93
Second product (3) €22.93
Third product (1) €12.95
Last product (1) €12.93
_________
TOTAL €61.74
VAT/CODE NET VAT
_____________________________
20% S €93.27 €18.66
PAYMENT TYPE AMOUNT
_____________________________
CASH €61.74
CARD €0.00
CHANGE GIVEN €3.07
__________________________________________
PLEASE KEEP THIS RECEIPT SAFE
FOR GUARANTEE PURPOSES
__________________________________________
Thanks for shopping with us!
VAT Number : 123 4567 89
www.webok.com
DEBUG:invoice2data.extract.invoice_template:END optimized_str ==========================
DEBUG:invoice2data.extract.invoice_template:Date parsing: languages=['en'] date_formats=['%d/%m/%Y %G:%i']
DEBUG:invoice2data.extract.invoice_template:Float parsing: decimal separator=.
DEBUG:invoice2data.extract.invoice_template:keywords=['www.webok.com', '123 4567 89']
DEBUG:invoice2data.extract.invoice_template:{'date_formats': ['%d/%m/%Y %G:%i'], 'lowercase': False, 'decimal_separator': '.', 'currency': 'USD', 'replace': [], 'languages': ['en'], 'remove_whitespace': False, 'remove_accents': False}
DEBUG:invoice2data.extract.invoice_template:field=amount | regexp=TOTAL\s+.(\d+\.\d+)
DEBUG:invoice2data.extract.invoice_template:res_find=[u'61.74']
DEBUG:invoice2data.extract.invoice_template:field=date | regexp=Date:\s+(\d{1,2}\/\d{1,2}\/\d{4}\s+\d{1,2}:\d{1,2})
DEBUG:invoice2data.extract.invoice_template:res_find=[u'03/12/2020 11:23']
DEBUG:invoice2data.extract.invoice_template:result of date parsing=2020-03-12 11:23:00
DEBUG:invoice2data.extract.invoice_template:field=invoice_number | regexp=Reference:\s(\w+)
DEBUG:invoice2data.extract.invoice_template:res_find=[u'ABC123']
DEBUG:invoice2data.extract.invoice_template:field=operator | regexp=Operators:\s(\w+)
DEBUG:invoice2data.extract.invoice_template:res_find=[u'Me']
DEBUG:invoice2data.extract.invoice_template:field=vat_lines | regexp=OrderedDict([('parser', 'lines'), ('start', 'PAYMENT TYPE\\s+AMOUNT\\s+_+'), ('end', '\\s_+\\s+PLEASE KEEP THIS RECEIPT SAFE'), ('line', '(?P<type_paiment>\\w+)\\s+.(?P<montant>\\d+\\.\\d+)'), ('types', OrderedDict([('montant', 'float')]))])
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/invoice2data", line 11, in <module>
load_entry_point('invoice2data==0.3.5', 'console_scripts', 'invoice2data')()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 201, in main
res = extract_data(f.name, templates=templates, input_module=input_module)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/main.py", line 93, in extract_data
return t.extract(optimized_str)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/invoice2data/extract/invoice_template.py", line 174, in extract
res_find = re.findall(v, optimized_str)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 181, in findall
return _compile(pattern, flags).findall(string)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 237, in _compile
p, loc = _cache[cachekey]
TypeError: unhashable type: 'OrderedDict'