Question

我今天早上跑步了：

刮刀在列表中运行良好，但只是按照代码继续说“跳过”。我检查了一些并确认我需要的信息是在网站上。

我已将我的代码逐个分开，但找不到任何更改 - 我甚至回到了我的代码的香草版本，看到并且仍然没有运气。

有人可以运行这个，看看我失踪了，因为我疯了！

目标网站https://www.realestate.com.au/property/12-buckingham-dr-werribee-vic-3030

代码：

import requests
import csv
from lxml import html

text2search = '''<p class="property-value__title">
      RECENTLY SOLD
    </p>'''

quote_page = ["https://www.realestate.com.au/property/12-buckingham-dr-werribee-vic-3030"]

with open('index333.csv', 'w') as csv_file:
    writer = csv.writer(csv_file)

    for index, url in enumerate(quote_page):
        page = requests.get(url)

        if text2search in page.text:
            tree = html.fromstring(page.content)

            (title,) = (x.text_content() for x in tree.xpath('//title'))
            (price,) = (x.text_content() for x in tree.xpath('//div[@class="property-value__price"]'))
            (sold,) = (x.text_content().strip() for x in tree.xpath('//p[@class="property-value__agent"]'))

            writer.writerow([url, title, price, sold])
        else:
            writer.writerow([url, 'skipped'])

Answer 1

HTML代码发生了变化，引入了额外的空白区域。这阻止了page.text：中的 text2search运行。

感谢@MarcinOrlowski指出我正确的方向

感谢@MT的建议 - 代码已经缩短，以减少再次出现这种情况的可能性。

刮板停止刮擦

1 个答案: