from requests import get
from bs4 import BeautifulSoup
res = get('https://www.ceda.com.au/Events/Upcoming-events')
soup = BeautifulSoup(res.text,"lxml")
event_location = '\n'.join([' '.join(item.find_parent().select("span")[0].text.split()) for item in soup.select(".side-list .icon-map-marker")])
print(event_location)
event_date = '\n'.join([' '.join(item.find_parent().select("span")[0].text.split()) for item in soup.select(".side-list .icon-calendar")])
print(event_date)
event_name = '\n'.join([' '.join(item.find_parent().select("class")[0].text.split()) for item in soup.select(".event-detail-bx .h1")])
print(event_name)
我正在尝试从网站中提取活动日期,地点和活动名称,我成功获取活动日期,活动超链接和地点信息。
但是我没有提取事件名称信息,有人可以帮我提取每个事件的所有事件名称和hyderlinks吗?
答案 0 :(得分:1)
尝试使用以下代码获取所需数据:
event_name = '\n'.join([item.text for item in soup.select(".event-detail-bx h1")])
print(event_name)
P.S。请注意,CSS选择器.event-detail-bx .h1
表示返回节点,其类名为“h1”,它是类名为“event-detail-bx”的节点的后代。如果您想获取 h1节点,该节点是具有类名“event-detail-bx”的节点的后代,则需要使用.event-detail-bx h1
答案 1 :(得分:1)
我认为您希望能够以一种有条理的方式获取所有数据:
import requests
from urllib.parse import urljoin
from bs4 import BeautifulSoup
url = 'https://www.ceda.com.au/Events/Upcoming-events'
res = requests.get(url)
soup = BeautifulSoup(res.text,"lxml")
for items in soup.select(".list-bx"):
event_name = ''.join([item.text for item in items.select(".event-detail-bx a h1")])
event_links = urljoin(url,''.join([item['href'] for item in items.select(".event-detail-bx a")]))
speaker_info = items.select(".sub-content-txt h3")[0].next_sibling.strip()
event_date = ''.join([' '.join(item.find_parent().select("span")[0].text.split()) for item in items.select(".icon-calendar")])
event_location = ''.join([' '.join(item.find_parent().select("span")[0].text.split()) for item in items.select(".icon-map-marker")])
print("Name: {}\nLink: {}\nSpeaker: {}\nDate: {}\nLocation: {}\n".format(event_name,event_links,speaker_info,event_date,event_location))
部分输出:
Name: 2018 Trustee welcome back
Link: https://www.ceda.com.au/Events/Library/Q180124
Speaker: Melinda Cilento, Chief Executive, CEDA
Date: 24/01/2018
Location: Brisbane Convention and Exhibition Centre
Name: NSW Trustee welcome back 2018
Link: https://www.ceda.com.au/Events/Library/N180130
Speaker: Luke Foley MP, NSW Opposition Leader, Parliament of NSW
Date: 30/01/2018
Location: Shangri-La Hotel