我有以下文字。
<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->
我需要获取标签以及数字。我在Python中有如下内容。
tag_list = re.findall(r'[A-Z]+(?:_[A-Z\d]+)+\(\d+\)', str(feed))
for tag in tag_list:
index = tag.index('(')
result[tag[:index]] = int(tag.split("(")[1].rstrip(")"))
print result
这会将输出打印为: -
{'RENAME_CSS': 3, 'IMAGE_COMPRESSION': 59, 'MINIFY_JAVASCRIPT': 10, 'RENAME_JAVASCRIPT': 9, 'RENAME_IMAGE': 59, 'EMBED_JAVASCRIPT': 2}
现在我只想对上面文中的应用进行此操作。例如,我想获得上述信息仅适用于&#39; TextTransApplie&#39;或者&#39; TagTransApplied&#39;
我尝试了以下方法: -
re.findall(r'TextTransApplied:[A-Z]+(?:_[A-Z\d]+)+\(\d+\)
但这只给出了第一个值。如何获取所有应用值的全部值。
答案 0 :(得分:1)
最好首先获取与TagTransApplied
/ TextTransApplied
相关的所有内容,然后提取所需的部分:
import re
feed = """<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->"""
result = dict()
tagged = re.findall(r'T(?:ag|ext)TransApplied[^;]+', str(feed))
for part in tagged:
tag_list = re.findall(r'[A-Z]+(?:_[A-Z\d]+)+\(\d+\)', part)
for tag in tag_list:
id = tag.index('(')
result[tag[:id]] = int(tag.split("(")[1].rstrip(")"))
print result
结果:
{'RENAME_CSS': 3, 'IMAGE_COMPRESSION': 59, 'MINIFY_JAVASCRIPT': 10, 'RENAME_JAVASCRIPT': 9, 'RENAME_IMAGE': 59, 'EMBED_JAVASCRIPT': 2}
答案 1 :(得分:0)
尝试获取捕获组内的所有内容,然后处理字符串。
(我稍微修改了您现有的逻辑,我已将RENAME_JAVASCRIPT(9)
更改为RENAME_JAVASCRIPT(19)
,只是为了说明区别)
import re
s = '<!-- FEO DEBUG OUTPUT [TextTransAttempted:RENAME_JAVASCRIPT(19), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TextTransApplied:RENAME_JAVASCRIPT(9), RENAME_IMAGE(59), MINIFY_JAVASCRIPT(10), (1), EMBED_JAVASCRIPT(2), RENAME_CSS(3), (1), IMAGE_COMPRESSION(59);TagTransAttempted:(73);TagTransApplied:(73); ] -->'
tag_list = re.findall(r'(?:TextTransAttempted|TextTransApplied):\s*((?:(?:[A-Z]+(?:_[A-Z\d]+)+)?\(\d+\)\s*(?:,\s*|;))*)', s)
for tag in tag_list:
result = {}
for e in tag.split(","):
index = e.index('(')
if e[:index].strip():
result[e[:index].strip()] = (e.split("(")[1].rstrip(");"))
print result
'''
OUTPUT
>>>
{'RENAME_CSS': '3', 'IMAGE_COMPRESSION': '59', 'MINIFY_JAVASCRIPT': '10', 'RENAME_JAVASCRIPT': '19', 'RENAME_IMAGE': '59', 'EMBED_JAVASCRIPT': '2'}
{'RENAME_CSS': '3', 'IMAGE_COMPRESSION': '59', 'MINIFY_JAVASCRIPT': '10', 'RENAME_JAVASCRIPT': '9', 'RENAME_IMAGE': '59', 'EMBED_JAVASCRIPT': '2'}
'''