在开头我会说我已经看过类似的问题,但是没有一种解决方案对我有用
所以我正在html页面中寻找特定的类,但是我总是得到None值。我在这里看到过一些描述相同问题的文章,但是没有一种解决方案对我有用。这是我的尝试-我正在寻找带有其名称的播放器标签,即'Chase Young'
{
"meta": {
"status": 200
},
"response": {
"item": {
"isLive": false,
"eventStartTime": "2020-03-26T00:00:00Z"
}
}
}
我尝试了另一种方法来找到匹配项,但仍返回None:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
import requests
url = "https://www.nfl.com/draft/tracker/prospects/allPositions?
college=allColleges&page=1&status=ALL&year=2020"
soup = BeautifulSoup(url.content, 'lxml')
match = soup.find('div', class_ = 'css-gu7inl')
print(match)
# Prints None
该html文件似乎不包含所有网页,因此我尝试使用硒,因为我在类似的帖子中看到了推荐,但仍然没有任何内容:
match = soup.find("div", {"class": "css-gu7inl"} # Print match is None
我在这里做错什么了?
答案 0 :(得分:2)
数据由Java脚本呈现,因此引发WebDriverWait
()并使用visibility_of_all_elements_located
()等待元素可见
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
url='https://www.nfl.com/draft/tracker/prospects/allPositions?college=allColleges&page=1&status=ALL&year=2020'
driver = webdriver.Chrome()
driver.get(url)
WebDriverWait(driver,20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,'.css-gu7inl')))
soup = BeautifulSoup(driver.page_source, 'lxml')
items=soup.select(".css-gu7inl")
Players=[item.select_one('a.css-1fwlqa').text for item in items]
print(Players)
输出:
['chase young', 'jeff okudah', 'derrick brown', 'isaiah simmons', 'joe burrow', "k'lavon chaisson", 'jedrick wills', 'tua tagovailoa', 'ceedee lamb', 'jerry jeudy', "d'andre swift", 'c.j. henderson', 'mekhi becton', 'mekhi becton', 'patrick queen', 'henry ruggs iii', 'henry ruggs iii', 'javon kinlaw', 'laviska shenault jr.', 'yetur gross-matos']
答案 1 :(得分:0)
代码1可以帮助您查看服务器的响应。 该响应包含服务器发送的HTML代码。 用另一个代码分析该代码的响应(来自服务器的HTML代码),并分离所需的类。
================================================ ===
import requests #CODE1
from requests_toolbelt.utils import dump
resp = requests.get('http://kanoon.ir/')
data = dump.dump_all(resp)
print(data.decode('utf-8'))
================================================ ====
代码输出:HTML代码:
< GET / HTTP/1.1
< Host: kanoon.ir
< User-Agent: python-requests/2.23.0
< Accept-Encoding: gzip, deflate
< Accept: */*
< Connection: keep-alive
<
...
================================================ ====
您为第二部分编写的代码(用于分析和HTML代码分离)取决于您的创造力。