下面的代码是一个ruby表达式。我想将它转换为python代码。我该怎么办?
add_zzim\(\'(.*?)\',\'(.*?)\',\'(?<param>.*?)\',.*
来源:
<li class="num" onClick="add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');" title="contents.">14913089</li>
<li class="num" onClick="add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');" title="contents.">14913012</li>
<li class="num" onClick="add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');" title="contents.">14913045</li>
答案 0 :(得分:1)
import re
p = re.compile(ur'add_zzim\(\'(.*?)\',\'(.*?)\',\'(.*?)\',.*')
test_str = u"<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');\" title=\"contents.\">14913089</li>\n<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');\" title=\"contents.\">14913012</li>\n<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');\" title=\"contents.\">14913045</li>\n"
for i in re.findall(p, test_str):
print(i[2])
这会给你列表,然后你可以将第3个元素作为'param'
答案 1 :(得分:0)
这是一种非正则表达式方法。
要提取onclick
属性值,我们将使用BeautifulSoup
HTML解析器;提取add_zzim()
参数值 - ast.literal_eval()
。
完整的工作示例:
from ast import literal_eval
from bs4 import BeautifulSoup
data = """
<ul>
<li class="num" onClick="add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');" title="contents.">14913089</li>
<li class="num" onClick="add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');" title="contents.">14913012</li>
<li class="num" onClick="add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');" title="contents.">14913045</li>
</ul>
"""
soup = BeautifulSoup(data, "html.parser")
for li in soup.select("li.num"):
args = literal_eval(li["onclick"].replace("add_zzim", "").rstrip(";"))
print(args)
打印:
('BD_AD_08', '14913089', 'helloooo', '3586312774', 'test')
('BD_AD_08', '14913012', 'helloooo', '3586312774', 'test')
('BD_AD_08', '14913041', 'helloooo', '3586312774', 'test')