Question

我正在通过尝试解决问题来学习python。

当我在登录到站点后尝试访问元素时，同样的命令在shell中工作，如果它在以下文件中，则不起作用。

另外我认为我的方法是错误的，因为元素不断更改其id并且唯一的常量是我尝试过的“更多搜索结果”：find_link_by_text失败了，我假设因为元素没有包含href。带有find_link_by_xpath文字的contains。

Webscraping：

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
import time
import requests, bs4, re, csv

chrome_path = r"C:\Users\-----\Desktop\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.get("https://dir.indiamart.com/search.mp?    ss=Power+Distribution+Transformers")
driver.maximize_window()
time.sleep(10)  #setting a gap for website load

action = webdriver.ActionChains(driver)
elm = driver.find_element_by_id("user_sign_in").click()
inputElement = driver.find_element_by_id('email')
inputElement.send_keys('xxxxxx')
driver.find_element_by_name("Submit3").send_keys(Keys.RETURN)
time.sleep(30)

#The code till above this is working perfectly
# element: 
#<div id="scroll2" class="fm2 p8 cur m_bt2" 
#onclick="javascript:displayResultsLogin('scroll2')"> Show More Results
# </div>
try:
    driver.find_element_by_id("scroll2").click()   
#Trying the the above find_element_* works if I input it in shell.
except:
    print("Didn't work")
    pass
# If I leave it in the file, removing the except, it shows element not found

r = driver.page_source
soup = bs4.BeautifulSoup(r, 'html.parser')
blocks = soup.find_all('div', class_='lst')

with open('output.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    for b in blocks:
        name = b.find(class_='cnm').get_text(strip=True)
        addr = b.find(class_='clg').get_text(strip=True)
        call = b.find(class_='ls_co phn').find(text=re.compile('\d+')).strip()
        writer.writerow([name, addr, call])

由于某些原因，在此文件中的最后一部分，只会将元素中的0添加到文件中，而不是xxxxxxxx编号。

Answer 1

它可以在shell中运行，而不是在脚本中运行时 - 这表明它是计时问题。在shell中，您在每个允许页面加载的命令之间都有延迟，而不是在脚本中。问题可以通过WebDriverWait and one of the Expected Conditions解决：

wait = WebDriverWait(driver, 10)
wait.until(EC.element_to_be_clickable((By.ID, "scroll2"))).click()

# or try locating the element by text
# wait.until(EC.element_to_be_clickable((By.XPATH, "//*[contains(., 'Show More Results')]"))).click()

wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".lst")))

r = driver.page_source
soup = bs4.BeautifulSoup(r, 'html.parser')

Webdriver find_element_by_id在shell中工作，但不会在python脚本中执行？

1 个答案: