Question

我正在尝试打印一些房价并且在使用Xpath时遇到了麻烦。这是我的代码：

from selenium import webdriver
driver = webdriver.Chrome("my/path/here")

driver.get("https://www.realtor.com/realestateandhomes-search/?pgsz=10")
for house_number in range(1,11):
    try:
        price = driver.find_element_by_xpath("""//*[@id="
{}"]/div[2]/div[1]""".format(house_number))
        print(price.text)
    except:
        print('couldnt find')

我在this网站上，试图打印前十个房屋的房价。

我的输出是，对于所有说“＆＃34; NEW＆＃34;”的房子，它被视为价格而不是实际价格。但对于没有新贴纸的底部两个，实际价格会被记录下来。

如何制作我的Xpath选择器以便选择数字而不是新的？

Answer 1

您可以在不加载图像的情况下像这样编写它，这可以提高您的提取速度

from selenium import webdriver
# Unloaded image
chrome_opt = webdriver.ChromeOptions()
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_opt.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome(chrome_options=chrome_opt,executable_path="my/path/here")
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10")
for house_number in range(1,11):
    try:
        price = driver.find_element_by_xpath('//*[@id="{}"]/div[2]/div[@class="srp-item-price"]'.format(house_number))
        print(price.text)
    except:
        print('couldnt find')

Answer 2

你正走在正确的轨道上，你刚刚做了一个太脆弱的XPath。我会尝试使它更加冗长，而不依赖于索引和通配符。

这是您的XPath（我出于示例目的使用了id="1"）：

//*[@id="1"]/div[2]/div[1]

这里是HTML（为简洁起见，删除了一些属性/元素）：

<li id="1">
    <div></div>
    <div class="srp-item-body">
        <div>New</div><!-- this is optional! -->
        <div class="srp-item-price">$100,000</div>
    </div>
</li>

首先，将*通配符替换为您希望包含id="1"的元素。这只是一种帮助＆＃34;自我记录＆＃34; XPath好一点：

//li[@id="1"]/div[2]/div[1]

接下来，您希望定位第二个<div>，但不是按索引搜索，而是尝试使用元素的属性（如果适用），例如class：

//li[@id="1"]/div[@class="srp-item-body"]/div[1]

最后，您希望以价格定位<div>。自从＆＃34; New＆＃34;文本位于其自己的<div>中，您的XPath定位的是第一个<div>（＆＃34;新＆＃34;），而不是<div>的价格。然而，你的XPath确实有效，如果＆＃34; New＆＃34;文本<div>不存在。

我们可以使用与上一步相似的方法，按属性定位。这会强制XPath始终以<div>为目标，价格为：

//li[@id="1"]/div[@class="srp-item-body"]/div[@class="srp-item-price"]

希望这有帮助！

所以...说完所有这些，如果你只是对价格感兴趣而没有别的，这可能也会有效：）

for price in driver.find_elements_by_class_name('srp-item-price'):
    print(price.text)

Answer 3

你可以尝试这段代码：

from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.realtor.com/realestateandhomes-search/Bladen-County_NC/sby-6/pg-1?pgsz=10")

prices=driver.find_elements_by_xpath('//*[@class="data-price-display"]')

for price in prices:
    print(price.text)

会打印

$39,900
$86,500
$39,500
$40,000
$179,000
$31,000
$104,900
$94,900
$54,900
$19,900

如果还需要其他任何细节，请告诉我

使xpath更具选择性？ [网页抓取]

3 个答案: