Question

我正在学习用json编写日志解析脚本并重新使用python。需要读取日志文件，找到具有Json响应的行，进行解析并以表格形式打印。我想使用finditer抓取每个子字符串，但是我的代码匹配了整个串联的字符串。

Code:
for m in re.finditer(r'{"APIResponse".*"Type":"\w+"}}',line,re.I):
   print (m.group(0))

但是具有Json字符串的行有时以这种格式连接：

{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"internet"}}{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"directdebit"}}Pament Request Output from Server ....
{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"directdebit"}}{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"directdebit"}}2018-07-09 10:01:18 DEBUG PaymentGatewayICSClient:981 - ClientRef         = 1587604390003
{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"internet"}}{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"internet"}}2018-07-09 10:01:18 DEBUG PaymentGatewayICSClient:981 - ClientRef         = 158760439AX00
{"APIResponse":{"ResponseCode"-1,"ResponseText":"Fail9"},"TxnResp":{"Type":"directdebit"}}{"APIResponse":{"ResponseCode":101,"ResponseText":"Success"},"TxnResp":{"Type":"directdebit"}}Transaction Denied

Answer 1

Your current regular expression is a greedy regex, which means it tries to match as long a string as it can; if line is

{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"internet"}}{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"directdebit"}}Pament Request Output from Server ....

running

for m in re.finditer(r'{"APIResponse".*"Type":"\w+"}}',line,re.I):
   print (m.group(0))

will give the result

{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"internet"}}{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"directdebit"}}

To make the regex act in a lazy manner, so that it returns the shortest string it can, add an ? after the .* characters:

for m in re.finditer(r'{"APIResponse".*?"Type":"\w+"}}',line,re.I):
   print (m.group(0))

output:

{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"internet"}}
{"APIResponse":{"ResponseCode":0,"ResponseText":"Success"},"TxnResp":{"Type":"directdebit"}}

Regular-expressions.info is an excellent resource for learning more about how regular expressions work.

在Python中将Json字符串拆分为单个字符串

1 个答案: