我在用乐透纸从网站获取数据进行统计时遇到了问题,我尝试了很多不同的解析器,但是每次返回的内容都是“无”时便会出现
import requests
from bs4 import BeautifulSoup
url = "https://www.opap.gr/lotto-draw-results"
user = {"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36"}
req = requests.get(url, headers = user)
soup = BeautifulSoup(req.text, "html.parser")
i = 1
while i <= 6:
for draw_num in soup.findAll("li", {"class": "draw-result-number-{}".format(i)}):
print(draw_num.content)
i += 1
网站上的那段html代码:
<ul class="circles"> <li class="draw-result-number-1">1</li> <li class="draw-result-number-2">2</li> <li class="draw-result-number-3">12</li> <li class="draw-result-number-4">14</li> <li class="draw-result-number-5">20</li> <li class="draw-result-number-6">49</li> <span class="plus_symbol" style="display: inline;">+</span> <li class="highlighted draw-result-number-bonus" style="display: inline-block;">8</li> </ul>
如果您能帮助我,我将不胜感激。
答案 0 :(得分:1)
从外观上看,数据不是嵌入在html中,而是从附加API调用中检索:
https://api.opap.gr/draws/v3.0/5103/last-result-and-active?status=results
您可以解析它以获得中奖号码:
import requests
req = requests.get("https://api.opap.gr/draws/v3.0/5103/last-result-and-active?status=results")
data = req.json()
print(data["last"]["winningNumbers"])
似乎url路径是静态的,在JS中它是动态构建url的,而5103表示它是Lotto
个游戏,请参见this file
答案 1 :(得分:0)
在您的情况下,如何使用selenium
:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import time
options = webdriver.ChromeOptions()
options.add_argument('headless')
capa = DesiredCapabilities.CHROME
capa["pageLoadStrategy"] = "none"
driver = webdriver.Chrome(chrome_options=options, desired_capabilities=capa)
driver.set_window_size(1440,900)
driver.get('https://www.opap.gr/lotto-draw-results')
time.sleep(15) # wait for the website to load in selenium process
plain_text = driver.page_source
soup = BeautifulSoup(plain_text, 'lxml')
所有元素都将包含在汤中。