使用各种独特案例在Python中进行字符串解析

时间:2012-10-04 13:01:17

标签: python string parsing

我的目标是将字符串转换为字典。这是它的样子:

[exploit] => 1
[hits] => 1
[completed] => 1
[is_malware] => 1
[summary] => 26.0@13965: suspicious.warning: object contains JavaScript
76.0@14467: suspicious.obfuscation using eval
76.0@14467: suspicious.obfuscation using String.fromCharCode

[severity] => 4
[engine] => 60

所以我尝试了几种方法来做到这一点,第一次尝试是split \n,但我遇到的问题是[摘要],内容被拆分,所以没有工作。然后我的第二次尝试是split =>但是我遇到了问题,一旦我在=>分裂它不会知道必须在\n分割下一个键。基本上它最终应该看起来像这样     {exploit:1,点击次数:1,已完成:1 ....}等等

非常感谢任何帮助。

2 个答案:

答案 0 :(得分:7)

您可以使用re.findall来解析文字:

>>> import re
>>> re.findall('\[([^]]+)\] => (.*?)(?=\n\[|$)', s, re.S)
[('exploit', '1'), ('hits', '1'), ('completed', '1'), ('is_malware', '1'), ('summary', '26.0@13965: suspicious.warning: object contains JavaScript\n76.0@14467: suspicious.obfuscation using eval\n76.0@14467: suspicious.obfuscation using String.fromCharCode\n'), ('severity', '4'), ('engine', '60')]

您可以通过调用dict将这些值放入字典中。

>>> dict(re.findall('\[([^]]+)\] => (.*?)(?=\n\[|$)', s, re.S))
{'engine': '60', 'hits': '1', 'severity': '4', 'is_malware': '1', 'summary': '26.0@13965: suspicious.warning: object contains JavaScript\n76.0@14467: suspicious.obfuscation using eval\n76.0@14467: suspicious.obfuscation using String.fromCharCode\n', 'exploit': '1', 'completed': '1'}

答案 1 :(得分:0)

total_string = """\
[exploit] => 1
[hits] => 1
[completed] => 1
[is_malware] => 1
[summary] => 26.0@13965: suspicious.warning: object contains JavaScript
76.0@14467: suspicious.obfuscation using eval
76.0@14467: suspicious.obfuscation using String.fromCharCode

[severity] => 4
[engine] => 60
"""

import re

pattern_RE = '\[([^]]+)\] => (.*?)(?=\n\[|$)'
report_dict = dict(re.findall(pattern_RE, total_string, re.S))

for k, v in report_dict.items():
    print('[{}]: {}'.format(k, v))

print(report_dict)

现在你向我们展示的是这个,但可能会有新行和回车隐藏。我们可以看到正则表达式似乎没问题。

{   'engine': '60', 
    'hits': '1', 
    'severity': '4', 
    'is_malware': '1', 
    'summary': '(all three captured)',
    'exploit': '1', 
    'completed': '1'
}

因此,如果正则表达式没有抓住这个,那么total_string的repr()必须与你粘贴的内容略有不同(可能是尾随的换行符,或其他东西)