我正在通过尝试解决问题来学习python。
当我在登录到站点后尝试访问元素时,同样的命令在shell中工作,如果它在以下文件中,则不起作用。
另外我认为我的方法是错误的,因为元素不断更改其id并且唯一的常量是我尝试过的“更多搜索结果”:find_link_by_text
失败了,我假设因为元素没有包含href
。带有find_link_by_xpath
文字的contains
。
Webscraping:
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
import requests, bs4, re, csv
chrome_path = r"C:\Users\-----\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://dir.indiamart.com/search.mp? ss=Power+Distribution+Transformers")
driver.maximize_window()
time.sleep(10) #setting a gap for website load
action = webdriver.ActionChains(driver)
elm = driver.find_element_by_id("user_sign_in").click()
inputElement = driver.find_element_by_id('email')
inputElement.send_keys('xxxxxx')
driver.find_element_by_name("Submit3").send_keys(Keys.RETURN)
time.sleep(30)
#The code till above this is working perfectly
# element:
#<div id="scroll2" class="fm2 p8 cur m_bt2"
#onclick="javascript:displayResultsLogin('scroll2')"> Show More Results
# </div>
try:
driver.find_element_by_id("scroll2").click()
#Trying the the above find_element_* works if I input it in shell.
except:
print("Didn't work")
pass
# If I leave it in the file, removing the except, it shows element not found
r = driver.page_source
soup = bs4.BeautifulSoup(r, 'html.parser')
blocks = soup.find_all('div', class_='lst')
with open('output.csv', 'w', newline='') as f:
writer = csv.writer(f)
for b in blocks:
name = b.find(class_='cnm').get_text(strip=True)
addr = b.find(class_='clg').get_text(strip=True)
call = b.find(class_='ls_co phn').find(text=re.compile('\d+')).strip()
writer.writerow([name, addr, call])
由于某些原因,在此文件中的最后一部分,只会将元素中的0添加到文件中,而不是xxxxxxxx编号。
答案 0 :(得分:0)
它可以在shell中运行,而不是在脚本中运行时 - 这表明它是计时问题。在shell中,您在每个允许页面加载的命令之间都有延迟,而不是在脚本中。问题可以通过WebDriverWait
and one of the Expected Conditions解决:
wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.ID, "scroll2"))).click()
# or try locating the element by text
# wait.until(EC.element_to_be_clickable((By.XPATH, "//*[contains(., 'Show More Results')]"))).click()
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".lst")))
r = driver.page_source
soup = bs4.BeautifulSoup(r, 'html.parser')