所以2天前我试图解析两个相同课程之间的数据,而Keyur在他留下其他问题后帮助了我很多..:D
现在我想获取特定类下的链接,这是我的代码,这里是错误。
from bs4 import BeautifulSoup
import urllib.request
import datetime
headers = {} # Headers gives information about you like your operation system, your browser etc.
headers['User-Agent'] = 'Mozilla/5.0' # I defined a user agent because HLTV perceive my connection as bot.
hltv = urllib.request.Request('https://www.hltv.org/matches', headers=headers) # Basically connecting to website
session = urllib.request.urlopen(hltv)
sauce = session.read() # Getting the source of website
soup = BeautifulSoup(sauce, 'lxml')
a = 0
b = 1
# Getting the match pages' links.
for x in soup.find('span', text=datetime.date.today()).parent:
print(x.find('a'))
错误:
实际上没有任何错误,但输出如下:
None
None
None
-1
None
None
-1
然后我研究并发现,如果没有任何数据可供给,找到函数不会给你任何东西。 然后我尝试使用find_all
代码:
print(x.find_all('a'))
输出:
AttributeError: 'NavigableString' object has no attribute 'find_all'
这是班级名称:
<div class="standard-headline">2018-05-01</div>
我不想将所有代码发布到此处,因此这里是链接hltv.org/matches/,因此您可以更轻松地检查课程。
答案 0 :(得分:0)
我不太确定我能理解OP真正想要抓住的链接。但是,我猜了一下。链接在复合类a-reset block upcoming-match standard-box
内,如果你能发现正确的类,那么一个单独的calss就足以获取像selectors
那样的数据。试一试。
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
from urllib.parse import urljoin
import datetime
url = 'https://www.hltv.org/matches'
req = Request(url, headers={"User-Agent":"Mozilla/5.0"})
res = urlopen(req).read()
soup = BeautifulSoup(res, 'lxml')
for links in soup.find(class_="standard-headline",text=(datetime.date.today())).find_parent().find_all(class_="upcoming-match")[:-2]:
print(urljoin(url,links.get('href')))
输出:
https://www.hltv.org/matches/2322508/yeah-vs-sharks-ggbet-ascenso
https://www.hltv.org/matches/2322633/team-australia-vs-team-uk-showmatch-csgo
https://www.hltv.org/matches/2322638/sydney-saints-vs-control-fe-lil-suzi-winner-esl-womens-sydney-open-finals
https://www.hltv.org/matches/2322426/faze-vs-astralis-iem-sydney-2018
https://www.hltv.org/matches/2322601/max-vs-fierce-tiger-starseries-i-league-season-5-asian-qualifier
依旧------