我无法抓取所有数据

时间:2020-07-10 22:25:27

标签: python web-scraping beautifulsoup

我无法从指定站点获得所有URL信息。 可以从图片中访问更多数据,我在这里编写代码,并假设是动态javascript网络抓取。就像我想要槲皮素链接或名称,但ı无法访问。

import bs4 as bs
import sys
import urllib.request
from PyQt5.QtWebEngineWidgets import QWebEnginePage
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl

class Page(QWebEnginePage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebEnginePage.__init__(self)
        self.html = ''
        self.loadFinished.connect(self._on_load_finished)
        self.load(QUrl(url))
        self.app.exec_()

    def _on_load_finished(self):
        self.html = self.toHtml(self.Callable)
        print('Load finished')

    def Callable(self, html_str):
        self.html = html_str
        self.app.quit()

url="https://foodb.ca/foods/FOOD00001"
page = Page(url)
soup = bs.BeautifulSoup(page.html, 'html.parser')

for a in soup.find_all('a', href=True):
    print ("Found the URL:", a['href'])

Wanted Found links

1 个答案:

答案 0 :(得分:0)

我使用Selenium并解决了问题,问题出在网站加载之后,因此您需要睡觉然后抓取数据。