我的代码进入一个网页,并从每一行
中获取某些数据不过,我还想从每一行中获取“主题”。例如,在“演讲者”文本上方的第 1 行中列为“总统会议和社区精神病学”。
我的代码目前能够抓取每一行的 Titles 和 Chairs(表示为 Role 和 Name),但不能抓取主题?
from selenium import webdriver
import time
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
import pandas as pd
driver = webdriver.Chrome()
driver.get('https://s7.goeshow.com/apa/annual/2021/session_search.cfm?_ga=2.259773066.1015449088.1617295032-97934194.1617037074')
page_source = driver.page_source
soup = BeautifulSoup(page_source, 'html.parser')
tables = soup.select('#datatable')
for table in tables:
for title in table.select('tr td.title'):
print(title.text.strip())
title_row = title.parent
speaker_row = title_row.next_sibling
for speaker in speaker_row.select('span.session-speaker'):
role = speaker.select_one('span.session-speaker-role').text.strip()
name = speaker.select_one('span.session-speaker-name').text.strip()
topic=speaker.select_one('span.session-track-label').text.strip()
print(role, name,topic)
print()
driver.quit()
答案 0 :(得分:0)
在循环的扬声器内部,您正在“span.session-speaker”元素下搜索元素,该元素下没有元素“span.session-track-label”
使用:
tables = soup.select('#datatable')
for table in tables:
for title in table.select('tr td.title'):
print(title.text.strip())
title_row = title.parent
speaker_row = title_row.next_sibling
for speaker in speaker_row.select('span.session-divider-line'):
role = speaker.select_one('span.session-speaker-role').text.strip()
name = speaker.select_one('span.session-speaker-name').text.strip()
topic = speaker.select_one('span.session-track-label').text.strip()
print(role, name, topic)
print()