脚本无法从重定向的网址解析标题

时间:2018-11-02 17:07:26

标签: python python-3.x selenium selenium-webdriver web-scraping

我已经用python用selenium编写了一个脚本,以从网页获取标头地址。我在脚本中使用的网址会在几秒钟内自动重定向。这是我的脚本遇到错误的地方。我要粘贴该错误的一部分,以便为您提供一个思路。

interface FormValue {
  options: Option[];
}

interface Option {
  availabilities: Availability[];
}

interface Availability {
  price: number;
}

function minPrice(f: FormValue) {
  const prices: number[] = f.options.map(option => option.availabilities)
                 .reduce((acc, curr)=> acc.concat(curr), [] as Availability[])
                 .map(availability => availability.price)
                 .reduce((acc, curr) => acc.concat(curr), [] as number[]);
  return Math.min(...prices);
}

Link to that url which gets redirected to another page

我尝试过的脚本:

ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

这是我希望从该页面获得的输出:

from contextlib import closing
from selenium import webdriver
from selenium.webdriver.support import ui

url = "https://www.rightmove.co.uk/propertyMedia/redirect.html?propertyId=30578943&contentId=1625965454&index=1"

with closing(webdriver.Chrome()) as wd:
    wait = ui.WebDriverWait(wd, 10)
    wd.get(url)
    item = wait.until(lambda driver: driver.find_element_by_css_selector("h1.header_address__title")).text
    print(item)

这是我在该错误之前看到的内容:

enter image description here

1 个答案:

答案 0 :(得分:1)

您可能需要更换

item = wait.until(lambda driver: driver.find_element_by_css_selector("h1.header_address__title")).text

这意味着等待特定元素出现在DOM中并立即获取其当前可见的文本(可能返回空字符串)

使用

item = wait.until(lambda driver: driver.find_element_by_css_selector("h1.header_address__title").text)

这意味着等待特定元素并在其不是空字符串时返回其可见文本

但是恕我直言,您可以轻松做到

item = driver.find_element_by_css_selector("h1.header_address__title").get_attribute('textContent')

获取文本值,即使该文本当前未显示在页面上

关于您的chromedriver that stops working问题:尝试将Chromechromedriver都更新到最新版本