民间, 我设法得到beautifulsoup用以下
刮一页html = response.read()
soup = BeautifulSoup(html)
links = soup.findAll('a')
有几次出现
<A href="javascript:Set_Variables('foo1','bar1''')"onmouseover="javascript: return window.status=''">
<A href="javascript:Set_Variables('foo2','bar2''')"onmouseover="javascript: return window.status=''">
如何迭代这个并获取foo / bar值?
由于
答案 0 :(得分:1)
您可以使用正则表达式从href
属性中提取变量:
import re
from bs4 import BeautifulSoup
data = """
<div>
<table>
<A href="javascript:Set_Variables('foo1','bar1''')" onmouseover="javascript: return window.status=''">
<A href="javascript:Set_Variables('foo2','bar2''')" onmouseover="javascript: return window.status=''">
</table>
</div>
"""
soup = BeautifulSoup(data)
pattern = re.compile(r"javascript:Set_Variables\('(\w+)','(\w+)'")
for a in soup('a'):
match = pattern.search(a['href'])
if match:
print match.groups()
打印:
('foo1', 'bar1')
('foo2', 'bar2')