Python - 如何使用不含硒的BeautifulSoup提取href(onclick)

时间:2018-04-15 03:19:16

标签: python html beautifulsoup onclick

这是元素。

<img src="http://dcimg6.dcinside.co.kr/viewimage.php?id=2fbcc323e7d334aa51b1d3a240&amp;no=24b0d769e1d32ca73fef84fa11d028318f52c0eeb141bee560297996d466c894cf2d16427672bba3d66d67f244141456484cb174854719ce631af568a8c297d4e29cc59286bf0d77bcf8d9267e7297e17913fdd84522b3d3" style="cursor:pointer;" onclick="javascript:imgPop('http://image.dcinside.com/viewimagePop.php?id=2fbcc323e7d334aa51b1d3a240&amp;no=24b0d769e1d32ca73fef84fa11d028318f52c0eeb141bee560297996d466c894cf2d16427672bba3d66d67f24450460016bd8e2258cb95ccf2b058e84f237054da6dbbc48af67310bf0bff50c529f1331053edf6','image','fullscreen=yes,scrollbars=yes,resizable=no,menubar=no,toolbar=no,location=no,status=no');">

我想在这里提取“http://image.dcinside.com/viewimagePop.php?id=2fbcc323e7d334aa51b1d3a240&no=24b0d769e1d32ca73fef84fa11d028318f52c0eeb141bee560297996d466c894cf2d16427672bba3d66d67f24450460016bd8e2258cb95ccf2b058e84f237054da6dbbc48af67310bf0bff50c529f1331053edf6

如何提取它?

1 个答案:

答案 0 :(得分:1)

这适用于BeautifulSoup v3和v4。

content = """
    <img src="http://dcimg6.dcinside.co.kr/viewimage.php?id=2fbcc323e7d334aa51b1d3a240&amp;no=24b0d769e1d32ca73fef84fa11d028318f52c0eeb141bee560297996d466c894cf2d16427672bba3d66d67f244141456484cb174854719ce631af568a8c297d4e29cc59286bf0d77bcf8d9267e7297e17913fdd84522b3d3" style="cursor:pointer;" onclick="javascript:imgPop('http://image.dcinside.com/viewimagePop.php?id=2fbcc323e7d334aa51b1d3a240&amp;no=24b0d769e1d32ca73fef84fa11d028318f52c0eeb141bee560297996d466c894cf2d16427672bba3d66d67f24450460016bd8e2258cb95ccf2b058e84f237054da6dbbc48af67310bf0bff50c529f1331053edf6','image','fullscreen=yes,scrollbars=yes,resizable=no,menubar=no,toolbar=no,location=no,status=no');">
"""

soup = BeautifulSoup(content)

onclick = soup.find('img').get('onclick')

js_item = onclick.split(',')[0]

link = js_item.replace('javascript:imgPop(', '').replace("'", "")