python BeautifulSoup在表中查找某些内容

时间:2014-07-21 21:10:30

标签: python html html-parsing beautifulsoup

民间,   我设法得到beautifulsoup用以下

刮一页
html =  response.read()
soup = BeautifulSoup(html)
links = soup.findAll('a')

有几次出现

<A href="javascript:Set_Variables('foo1','bar1''')"onmouseover="javascript: return window.status=''">
<A href="javascript:Set_Variables('foo2','bar2''')"onmouseover="javascript: return window.status=''">

如何迭代这个并获取foo / bar值?

由于

1 个答案:

答案 0 :(得分:1)

您可以使用正则表达式从href属性中提取变量:

import re
from bs4 import BeautifulSoup

data = """
<div>
    <table>
        <A href="javascript:Set_Variables('foo1','bar1''')" onmouseover="javascript: return window.status=''">
        <A href="javascript:Set_Variables('foo2','bar2''')" onmouseover="javascript: return window.status=''">
    </table>
</div>
"""

soup = BeautifulSoup(data)

pattern = re.compile(r"javascript:Set_Variables\('(\w+)','(\w+)'")
for a in soup('a'):
    match = pattern.search(a['href'])
    if match:
        print match.groups()

打印:

('foo1', 'bar1')
('foo2', 'bar2')