import requests
from bs4 import BeautifulSoup
res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")
event_containers = soup.find_all('div', class_ = "col-xs-12 col-sm-6 col-md-8")
first_event = event_containers[0]
print(first_event.h3.text)
通过使用此代码,我能够提取事件名称,我试图循环并提取所有事件名称和日期?我也试图提取点击readmore链接后可见的位置信息
答案 0 :(得分:1)
event_containers
是bs4.element.ResultSet
对象,基本上是Tag
个对象的列表。
只需遍历event_containers
中的代码,然后为标题选择h3
,为日期选择div.date
,为网址选择a
,例如:
for tag in event_containers:
print(tag.h3.text)
print(tag.select_one('div.date').text)
print(tag.a['href'])
现在,对于位置信息,您必须访问每个网址并收集div.date
中的文字
完整代码:
import requests
from bs4 import BeautifulSoup
res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")
event_containers = soup.find_all('div', class_ = "col-xs-12 col-sm-6 col-md-8")
base_url = 'http://aicd.companydirectors.com.au'
for tag in event_containers:
link = base_url + tag.a['href']
soup = BeautifulSoup(requests.get(link).text,"lxml")
location = ', '.join(list(soup.select_one('div.event-add').stripped_strings)[1:-1])
print('Title:', tag.h3.text)
print('Date:', tag.select_one('div.date').text)
print('Link:', link)
print('Location:', location)
答案 1 :(得分:1)
尝试此操作以获取您所关注的所有事件和日期:
import requests
from bs4 import BeautifulSoup
res = requests.get('http://aicd.companydirectors.com.au/events/events-calendar')
soup = BeautifulSoup(res.text,"lxml")
for item in soup.find_all(class_='lead'):
date = item.find_previous_sibling().text.split(" |")[0]
print(item.text,date)