我正试图从《季节性食品指南》中获取有关食品季节性的数据,但遇到了麻烦。该网站具有相当简单的URL结构:
https://www.seasonalfoodguide.org/produce_name/state_name
我已经能够使用Selenium
和Beautiful Soup
从一页成功地抓取季节性信息,但是在随后的循环中,我要查找的文本部分实际上并未加载,因此我得到AttributeError: 'NoneType' object has no attribute 'text'
。我知道这是因为months_list_raw
由于该页面的'wheel-months-list'
部分没有在第二个循环中加载而返回为空。代码如下。有什么想法吗?
for ingredient in produce_list:
for state in state_list:
# grab page content
search_url = 'https://www.seasonalfoodguide.org/{}/{}'.format(ingredient,state)
driver.get(search_url)
page_soup = soup(driver.page_source, 'lxml')
# grab list of months
months_list_raw = page_soup.find('p',{'id':'wheel-months-list'})
months_list = months_list_raw.text
答案 0 :(得分:1)
页面在客户端呈现,这意味着当您打开页面时,正在向后端服务器发出另一个请求,以根据您选择的过滤器获取数据。因此,问题在于,当您打开页面并阅读HTML时,内容尚未完全加载。您可以做的最简单的事情是,在用Selenium打开页面之后要休眠一段时间,以等待页面完全加载。我已经通过在time.sleep(3)
之后插入driver.get(search_url)
来测试您的代码,并且工作正常。
答案 1 :(得分:0)
为防止错误发生并继续循环,您需要检查find_element_by_id(ICN_Feedback_3400653_125630)
元素不是<a class="d2l-imagelink" id="ICN_Feedback_3444653_124440"
href="javascript:void(0);" onclick="return false;" title="Edit comments for
FIRSTNAME LASTNAME in a new window" aria-label="Edit comments for FIRSTNAME
LASTNAME in a new window" role="button">
时的情况。似乎某些农产品页面在某些状态下没有任何数据,因此您将需要在程序中按需要进行处理。
driver = webdriver.Chrome(chrome_path)
driver.get(commentsPage)
assert "****" in driver.title
user = driver.find_element_by_name("userName")
user.clear()
user.send_keys("USERNAME")
pas = driver.find_element_by_name("password")
pas.clear()
pas.send_keys("PASSWORD")
user.send_keys(Keys.RETURN)
driver.get(commentsPage)
for i in toplist:
icnFeedback = (""" "//a[@title='Enter comments for """+ i[0] + """ in a
new window']" """)
myElement = driver.find_element_by_xpath(icnFeedback)
# find user by orgid
driver.execute_script("arguments[0].click();", myElement)
#clicks the feedback button
time.sleep(2)
iframes2 = driver.find_elements_by_tag_name("iframe")
#looks for the iframes on main page
driver.switch_to.frame(iframes2[1])
#this switches from main page to the iframe#2
time.sleep(1)
iframes3 = driver.find_elements_by_tag_name("iframe")
#looks for the iframes inside iframe#2
driver.switch_to.frame(iframes3[0])
#this switches from iframes#2 to iframe#3
time.sleep(1)
textBox = driver.find_element_by_id('tinymce')
#finds textbox
comments = i[1]
textBox.clear()
#clears previous text
textBox.send_keys(comments)
#send comments
time.sleep(2)
driver.switch_to.default_content()
#switches out of all iframes
iframes2 = driver.find_elements_by_tag_name("iframe")
#looks for the iframes on main page
driver.switch_to.frame(iframes2[1])
#this switches from main page to the iframe#2
button = driver.find_element(By.XPATH, '//button[text()="Save"]').click()
#looks for save button
time.sleep(1)