Question

这是代码

url='http://linkblend.icu/nAK5f'
req=requests.get(url).text
soup=BeautifulSoup(req,'html.parser')
data=soup.findAll('a')
print(data)

这是结果

[<a class="navbar-brand" href="/">Link Blend</a>, <a href="/advertising-rates">Advertising</a>, <a href="/payout-rates">Payout Rates</a>, <a class="btn btn-success btn-lg get-link disabled" href="javascript: void(0)">
            Please wait...        </a>, <a href="/pages/privacy">Privacy Policy</a>, <a href="/pages/terms">Terms of Use</a>, <a href="https://www.facebook.com/#"><i class="fa fa-facebook"></i></a>, <a href="https://twitter.com/#"><i class="fa fa-twitter"></i></a>, <a href="https://plus.google.com/#"><i class="fa fa-google-plus"></i></a>]

Answer 1

如果您想继续使用BeautifulSoup来解析html源，请在Selenium的帮助下使用以下内容

from bs4 import BeautifulSoup as bs
from selenium import webdriver as wd

""" Initiate the firefox driver """
driver = wd.Firefox()

""" Navigate to desired url """
driver.get('site.com')

""" retrieve source """
html = driver.page_source

""" Parse via BeautifulSoup """
soup = bs(html)

""" Locate tag """
data=soup.findAll('a')
print(data)

""" Close down browser """
driver.close()

提出获取链接的请求时，我如何报废Google链接，请说3秒后生成链接，无法报废

1 个答案: