python重新提取大括号内的项目

时间:2014-09-05 16:58:09

标签: python regex

我有一个大型数据集,例如在我的sql中,例如:

("Successfully confirmed payment - {'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['1037-5147-8706-9322'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['1917b2c0e5a51'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['3U4531424V959583R'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['245.40'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['Eligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-82295469MY6979044'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['7507921'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible,UnauthorizedPaymentEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-08-29T09:15:59Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']}", 1L, datetime.datetime(2013, 8, 29, 11, 15, 59))

我使用以下正则表达式从curley括号内的第一个项目列表中提取数据

paypal_meta_re = re.compile(r"""\{(.*)\}""").findall

这可以正常工作,但是当我尝试从字典值中删除方括号时,出现错误。

这是我的代码:

paypal_meta = get_paypal(order_id)
paypal_msg_re = paypal_meta_re(paypal_meta[0])
print type(paypal_msg_re), len(paypal_msg_re)
paypal_str = ''.join(map(str, paypal_msg_re))
print paypal_str, type(paypal_str)
paypal = ast.literal_eval(paypal_str)
paypal_dict = {}
for k, v in paypal.items():
    paypal_dict[k] = str(v[0])
if paypal_dict:
    namespace['payment_gateway'] = { 'paypal' : paypal_dict}

这是追溯:

Traceback (most recent call last):
  File "users.py", line 383, in <module>
    orders = get_orders(user_id, mongo_user_id, address_book_list)
  File "users.py", line 290, in get_orders
    paypal = ast.literal_eval(paypal_str)
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 49, in literal_eval
    node_or_string = parse(node_or_string, mode='eval')
  File "/usr/local/Cellar/python/2.7.2/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    'PAYMENTINFO_0_TRANSACTIONTYPE': ['expresscheckout'], 'ACK': ['Success'], 'PAYMENTINFO_0_PAYMENTTYPE': ['instant'], 'PAYMENTINFO_0_RECEIPTID': ['2954-8480-1689-8177'], 'PAYMENTINFO_0_REASONCODE': ['None'], 'SHIPPINGOPTIONISDEFAULT': ['false'], 'INSURANCEOPTIONSELECTED': ['false'], 'CORRELATIONID': ['5f22a1dddd174'], 'PAYMENTINFO_0_TAXAMT': ['0.00'], 'PAYMENTINFO_0_TRANSACTIONID': ['36H74806W7716762Y'], 'PAYMENTINFO_0_ACK': ['Success'], 'PAYMENTINFO_0_PENDINGREASON': ['authorization'], 'PAYMENTINFO_0_AMT': ['86.76'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITY': ['PartiallyEligible'], 'PAYMENTINFO_0_ERRORCODE': ['0'], 'TOKEN': ['EC-6B957889FK3149915'], 'VERSION': ['95.0'], 'SUCCESSPAGEREDIRECTREQUESTED': ['true'], 'BUILD': ['6680107'], 'PAYMENTINFO_0_CURRENCYCODE': ['GBP'], 'TIMESTAMP': ['2013-07-02T13:02:50Z'], 'PAYMENTINFO_0_SECUREMERCHANTACCOUNTID': ['XFQALBN3EBE8S'], 'PAYMENTINFO_0_PROTECTIONELIGIBILITYTYPE': ['ItemNotReceivedEligible'], 'PAYMENTINFO_0_ORDERTIME': ['2013-07-02T13:02:49Z'], 'PAYMENTINFO_0_PAYMENTSTATUS': ['Pending']
                                   ^
SyntaxError: invalid syntax

如果我使用

分割代码
msg, paypal_msg = paypal_meta[0].split(' - ')
paypal = ast.literal_eval(paypal_msg)
paypal_dict = {}
for k, v in paypal.items():
    paypal_dict[k] = str(v[0])
if paypal_dict:
    namespace['payment_gateway'] = { 'paypal' : paypal_dict}
insert = orders_dbs.save(namespace)
return insert

这样可行,但我无法使用它,因为返回的某些记录不会分裂且不准确。

基本上,我想取大括号中的项目并从值中删除方括号,然后从中创建一个新的字典。

1 个答案:

答案 0 :(得分:1)

你需要包含花括号,你的代码省略了这些:

r"""({.*})""")

请注意,括号现在围绕 {...}

或者,如果在字典之前总是有消息和一个破折号,您可以使用str.partition()将其拆分:

paypal_msg = paypal_meta[0].partition(' - ')[-1]

或限制您使用str.split()拆分一次:

paypal_msg = paypal_meta[0].split(' - ', 1)[-1]

尽量避免将这样的Python结构放入数据库中;将JSON存储在单独的列中,而不是对象的字符串转储中。