import requests
from bs4 import BeautifulSoup
URL = ""
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='simple-view')
events_elems = results.find_all('ul', class_='searchResults')
for event_elem in events_elems:
date_elem = event_elem.find('li', class_='date-indicator')
location_elem = event_elem.find('div', class_='text--labelSecondary')
e_elem = event_elem.find('a', class_='event')
if None in (date_elem,location_elem, e_elem):
continue
print(date_elem.text)
print(location_elem.text)
print(e_elem.text)
我刚刚开始使用python web抓取功能,尝试使用上面的代码在metup.com上进行抓取,但是只显示了一组结果,在迭代部分做错了吗?
答案 0 :(得分:1)
您使用的.find_all
events_elems = results.find_all('ul', class_='searchResults')
没有捕获网站中的每一行,即您需要加强搜索条件。
您使用的event_elem.find('li', class_='date-indicator')
也不足够,因为它没有记录每个事件的日期。
请参阅以下工作代码,该代码通过事件列表的容器捕获结果集:
import requests
from bs4 import BeautifulSoup
URL = "https://www.meetup.com/find/events/"
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='simple-view')
event_container = results.find_all('ul', class_='event-listing-container')[0]
events_elems = event_container.find_all(class_= 'event-listing')
for event_elem in events_elems:
location_elem = event_elem.find('div', class_='text--labelSecondary')
e_elem = event_elem.find('a', class_='event')
date = "{}-{}-{} {}".format(
event_elem.attrs['data-year'],
event_elem.attrs['data-month'],
event_elem.attrs['data-day'],
event_elem.find('time').text.replace('\n', ''),
)
print(date)
print(location_elem.text)
print(e_elem.text)
print('-----')
示例输出为
2020-2-17 9:00AM
Architecting for Innovation
Australasian Enterprise Architecture Summer School 2020
-----
2020-2-17 5:00PM
Sydney Indoor Rock Climbers
Monday and Thursday Night Climbing @ St Peters (Beginners Welcome)
-----
2020-2-17 5:30PM
......
......