我正在使用 Selenium 和 python 来搜索关键字,然后在搜索结果中我尝试单击前 5 个 url 并从 p 标签获取数据,然后返回。所以基本上然后我存储来自这 5 个站点的数据。但不知何故,在搜索关键字后,我不会点击网址并获取数据。我不知道怎么了。这是我写的代码。请帮忙。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome(executable_path="E:\chromedriver\chromedriver.exe")
driver.get("https://www.google.com/")
print(driver.title)
driver.maximize_window()
time.sleep(2)
driver.find_element(By.XPATH, "//input[@name='q']").send_keys('selenium')
driver.find_element(By.XPATH, "//div[@class='FPdoLc tfB0Bf']//input[@name='btnK']").send_keys(Keys.ENTER)
a = driver.find_elements_by_xpath("//div[@class='g']/a[@href]")
links = []
for x in a:
links.append(x.get_attribute('href'))
link_data = []
for new_url in links:
print('new url : ', new_url)
driver.get(new_url)
link_data.append(driver.page_source)
b = driver.find_elements(By.TAG_NAME, "p")
for data in b:
print(data.text)
driver.back()
driver.close()
答案 0 :(得分:0)
您的链接的 xpath 错误,应该是:
"//div[@class='yuRUbf']/a[@href]"
如果您查看代码的相关部分,您会看到 <a>
标签不是 <div class="g">
的子代,而是 <div class="yuRUbf">
<div class="g"><!--m-->
<div class="tF2Cxc" data-hveid="CAkQAA" data-ved="2ahUKEwjphfjOoazuAhUO1VkKHVSkA_oQFSgAMAp6BAgJEAA">
<div class="yuRUbf"><a href="https://www.healthline.com/nutrition/selenium-benefits"
data-ved="2ahUKEwjphfjOoazuAhUO1VkKHVSkA_oQFjAKegQICRAC"
ping="/url?sa=t&source=web&rct=j&url=https://www.healthline.com/nutrition/selenium-benefits&ved=2ahUKEwjphfjOoazuAhUO1VkKHVSkA_oQFjAKegQICRAC"><br>
<h3 class="LC20lb DKV0Md"><span>7 Science-Based Health Benefits of Selenium - Healthline</span></h3>
<div class="TbwUpd NJjxre"><cite class="iUh30 Zu0yb qLRx3b tjvcx">www.healthline.com<span
class="dyjrff qzEoUe"><span> › nutrition › selenium-benefits</span></span></cite></div>
</a>
...
</div>
</div>
</div>
您也可以稍微更改搜索行,但不会改变整体效果:
driver.find_element_by_xpath("//input[@name='q']").send_keys('selenium', Keys.ENTER)
答案 1 :(得分:0)
如果你想使用 16 个左右的链接。
driver.get("https://www.google.com/")
print(driver.title)
driver.maximize_window()
time.sleep(2)
driver.find_element(By.XPATH, "//input[@name='q']").send_keys('selenium')
driver.find_element(By.XPATH, "//input[@name='btnK']").send_keys(Keys.ENTER)
a = driver.find_elements_by_xpath("//div[@class='g']/div/div/a")
links = []
for x in a:
links.append(x.get_attribute('href'))
link_data = []
for new_url in links:
print('new url : ', new_url)
driver.get(new_url)
link_data.append(driver.page_source)
b = driver.find_elements(By.TAG_NAME, "p")
for data in b:
print(data.text)
driver.back()