我需要在特定单词之前提取单词。
我的数据是
data="""70MHeAhULOY8KHVLaBwcQHzAAegQICBAF">Similar</a>
</li></ol></div></div></span></div><div class="s"><div>
<span class="st">Mail: Consumer Advisory Service, PO Box
1673, MELBOURNE <em>VIC</em> 3001. Email: Click here to
contact us via email. Any personal information you give
;...kJP70MHeAhULOY8KHVLaBwcQIDAKegQIBxAE">Cached </a>
</li></ol></div></div></span></div><div class="s"><div>
<span class="st">Australia. Consumer Advisory Service
GPO Box
1673. MELBOURNE, <em>VIC</em>, 3001. AUSTRALIA. New Zealand.
Cadbury Freepost 577. PO Box 890. Dunedin ...</span>
我正在尝试提取'VIC'
由于我们的数据中有两个匹配项,因此我的预期输出为['1673, MELBOURNE','1673. MELBOURNE,']
我的代码:
re.find_all(r"\*+\s(\*) <em> vic",data)
但不起作用
答案 0 :(得分:-2)
您可以使用此正则表达式提取 VIC 之前的两个单词,
\s+([^\s]+?\s+[^\s]+?)\s*<em>VIC<\/em>
这是相同的python示例代码,
import re
data='70MHeAhULOY8KHVLaBwcQHzAAegQICBAF">Similar</a></li></ol></div></div></span></div><div class="s"><div> <span class="st">Mail: Consumer Advisory Service, PO Box 1673, MELBOURNE <em>VIC</em> 3001. Email: Click here to contact us via email. Any personal information you give ;...kJP70MHeAhULOY8KHVLaBwcQIDAKegQIBxAE">Cached </a> </li></ol></div></div></span></div><div class="s"><div> <span class="st">Australia. Consumer Advisory Service GPO Box 1673. MELBOURNE, <em>VIC</em>, 3001. AUSTRALIA. New Zealand. Cadbury Freepost 577. PO Box 890. Dunedin ...</span>'
d = re.findall(r"\s+([^\s]+?\s+[^\s]+?)\s*<em>VIC<\/em>",data)
print(d)
这将提供以下输出,
['1673, MELBOURNE', '1673. MELBOURNE,']