我想解析this文件。这是一个示例片段。
'13138' => { 'REFERENCE' => '13138', 'NAME' => 'DRAPER Five 125mm Medium Grade Aluminium Oxide Sanding Discs', 'PRICE' => 108, 'MIN_QUANTITY_ORDERABLE' => 1, 'MAX_QUANTITY_ORDERABLE' => 0, 'OUT_OF_STOCK' => 0, 'DATE_PROMPT' => '', 'OTHER_INFO_PROMPT' => '', 'PRICING_MODEL' => 0, 'TAX_1' => '101=2000.00=0=', 'OPAQUE_SHIPPING_DATA' => '0.054', 'ALT_WEIGHT' => '', 'SHIP_SEPARATELY' => 0, 'SHIP_CATEGORY' => '', 'SHIP_SUPPLEMENT' => 0, 'SHIP_SUPPLEMENT_ONCE' => 0, 'HAND_SUPPLEMENT' => 0, 'HAND_SUPPLEMENT_ONCE' => 0, 'SHIP_QUANTITY' => 1, 'COST_PRICE' => 0, 'EXCLUDE_FROM_SHIP' => 0, 'ASSEMBLY_PRODUCT' => 0, 'STOCK_AISLE' => '', 'STOCK_RACK' => '', 'STOCK_SUB_RACK' => '', 'STOCK_BIN' => '', 'BARCODE' => '', 'REPORT_DESC' => '', 'PRICES' => {
1 => [
[0,108],
],
},
'CUSTOMVARS' =>
{
},
'NO_ORDERLINE' => 0, 'AUTOSHIP' => 0, 'PRODUCT_GROUP' => -1, 'THUMBNAIL' => '', 'IMAGE' => '13138_694.jpg', 'ALSOBOUGHT' => [], 'RELATED' => [], },
'13139' => { 'REFERENCE' => '13139', 'NAME' => 'DRAPER Five 125mm Coarse Grade Aluminium Oxide Sanding Discs', 'PRICE' => 96, 'MIN_QUANTITY_ORDERABLE' => 1, 'MAX_QUANTITY_ORDERABLE' => 0, 'OUT_OF_STOCK' => 0, 'DATE_PROMPT' => '', 'OTHER_INFO_PROMPT' => '', 'PRICING_MODEL' => 0, 'TAX_1' => '101=2000.00=0=', 'OPAQUE_SHIPPING_DATA' => '0.066', 'ALT_WEIGHT' => '', 'SHIP_SEPARATELY' => 0, 'SHIP_CATEGORY' => '', 'SHIP_SUPPLEMENT' => 0, 'SHIP_SUPPLEMENT_ONCE' => 0, 'HAND_SUPPLEMENT' => 0, 'HAND_SUPPLEMENT_ONCE' => 0, 'SHIP_QUANTITY' => 1, 'COST_PRICE' => 0, 'EXCLUDE_FROM_SHIP' => 0, 'ASSEMBLY_PRODUCT' => 0, 'STOCK_AISLE' => '', 'STOCK_RACK' => '', 'STOCK_SUB_RACK' => '', 'STOCK_BIN' => '', 'BARCODE' => '', 'REPORT_DESC' => '', 'PRICES' => {
1 => [
[0,96],
],
},
'CUSTOMVARS' =>
{
},
'NO_ORDERLINE' => 0, 'AUTOSHIP' => 0, 'PRODUCT_GROUP' => -1, 'THUMBNAIL' => '', 'IMAGE' => '13139_694.jpg', 'ALSOBOUGHT' => [], 'RELATED' => [], },
'13140' => { 'REFERENCE' => '13140', 'NAME' => 'DRAPER Five Extra Coarse Grade Aluminium Oxide Sanding Discs', 'PRICE' => 96, 'MIN_QUANTITY_ORDERABLE' => 1, 'MAX_QUANTITY_ORDERABLE' => 0, 'OUT_OF_STOCK' => 0, 'DATE_PROMPT' => '', 'OTHER_INFO_PROMPT' => '', 'PRICING_MODEL' => 0, 'TAX_1' => '101=2000.00=0=', 'OPAQUE_SHIPPING_DATA' => '0.055', 'ALT_WEIGHT' => '', 'SHIP_SEPARATELY' => 0, 'SHIP_CATEGORY' => '', 'SHIP_SUPPLEMENT' => 0, 'SHIP_SUPPLEMENT_ONCE' => 0, 'HAND_SUPPLEMENT' => 0, 'HAND_SUPPLEMENT_ONCE' => 0, 'SHIP_QUANTITY' => 1, 'COST_PRICE' => 0, 'EXCLUDE_FROM_SHIP' => 0, 'ASSEMBLY_PRODUCT' => 0, 'STOCK_AISLE' => '', 'STOCK_RACK' => '', 'STOCK_SUB_RACK' => '', 'STOCK_BIN' => '', 'BARCODE' => '', 'REPORT_DESC' => '', 'PRICES' => {
1 => [
[0,96],
],
},
'CUSTOMVARS' =>
{
},
'NO_ORDERLINE' => 0, 'AUTOSHIP' => 0, 'PRODUCT_GROUP' => -1, 'THUMBNAIL' => '', 'IMAGE' => '13140_694ii.jpg', 'ALSOBOUGHT' => [], 'RELATED' => [], },
这里包含3个项目。它们以'13138' => { 'REFERENCE'
之类的字符串开头。并在相同类型的字符串之前结束。我该如何拆分这些零件?
我试过了re.search(r"{ 'REFERENCE'.*?(?={ 'REFERENCE')", catstr)
。但它不匹配。
答案 0 :(得分:3)
为什么不将=>
替换为:
:
'CUSTOMVARS' :
{
},
'NO_ORDERLINE' : 0, 'AUTOSHIP' : 0, 'PRODUCT_GROUP' : -1, ...
使用ast.literal_eval
对其进行评估。它只评估文字,而不是可执行代码,因此不需要进行清理(除了可能需要保护过大的输入):
ast.literal_eval(node_or_string)
安全地评估表达式节点或包含Python表达式的字符串。 提供的字符串或节点可能只包含 以下Python文字结构:字符串,数字,元组,列表, dicts,booleans和None。
这可以用于安全地评估包含Python表达式的字符串 不需要解析的不受信任的来源 重视自己。
编辑:一个工作示例
#!/usr/bin/env python2
# -*- encoding: utf8 -*-
import urllib2
import ast
import re
from pprint import PrettyPrinter
pp = PrettyPrinter()
resp = urllib2.urlopen("http://pastie.org/pastes/7461356/download")
content = resp.read()
content = re.search(r"\s+=\s+({(?:.|\n)+});", content).group(1)
# Fix following line to handle => inside strings, if needed
content = re.sub(r"=>", r":", content)
parsed = ast.literal_eval(content)
pp.pprint(parsed)
有关仅在外部字符串中替换=>
的信息,请参阅
这个答案:
修改强>
给定文件除了散列本身之外还包含其他标记。正则表达式
上面re.search
删除多余的令牌:
\s+=\s+ # This marks the = before the start of the hash
({ # Capture the first {
(?:.|\n)+ # This matches all characters.
# The (?: is to prevent capture-inside-capture
}) # Capture the last }
; # This is not captured