我正尝试在https://www.nexmo.com/products/sms网页中获取所有国家/地区的列表。我看到列表显示在下拉列表中。检查页面后,我尝试了以下代码,但是我必须做错了什么。我希望在这里有所帮助。
import requests
from bs4 import BeautifulSoup
# collect and parse page
page = requests.get('https://www.nexmo.com/products/sms')
soup = BeautifulSoup(page.text, 'html.parser')
# pull all text from the div
name_list = soup.find(class_ ='dropdown-content')
print(name_list)
答案 0 :(得分:0)
此网页使用JavaScript呈现HTML。您可以使用Selenium渲染它。首先安装Selenium。
sudo pip3 install selenium
然后获取驱动程序https://sites.google.com/a/chromium.org/chromedriver/downloads(根据您的操作系统,您可能需要指定驱动程序的位置)
from selenium import webdriver
from bs4 import BeautifulSoup
browser = webdriver.Chrome()
url = ('https://www.nexmo.com/products/sms')
browser.get(url)
html_source = browser.page_source
browser.quit()
soup = BeautifulSoup(html_source, 'html.parser')
for name_list in soup.find_all(class_ ='dropdown-row'):
print(name_list.text)
输出:
Afghanistan
Albania
...
Zambia
Zimbabwe
已更新
或者使用PyQt5:
在Ubuntu上
sudo apt-get install python3-pyqt5
sudo apt-get install python3-pyqt5.qtwebengine
其他操作系统:
pip3 install PyQt5
然后运行:
from bs4 import BeautifulSoup
import sys
from PyQt5.QtWidgets import QApplication
from PyQt5.QtCore import QUrl
from PyQt5.QtWebEngineWidgets import QWebEngineView
class Render(QWebEngineView):
def __init__(self, url):
self.html = None
self.app = QApplication(sys.argv)
QWebEngineView.__init__(self)
self.loadFinished.connect(self._loadFinished)
self.load(QUrl(url))
self.app.exec_()
def _loadFinished(self, result):
self.page().toHtml(self.callable)
def callable(self, data):
self.html = data
self.app.quit()
url = 'https://www.nexmo.com/products/sms'
html_source = Render(url).html
soup = BeautifulSoup(html_source, 'html.parser')
for name_list in soup.find_all(class_ ='dropdown-row'):
print(name_list.text)