Question

这个脚本在“while True：”处循环，通过单击底部的下一个按钮来编写来自多个页面的数据，但是我无法弄清楚如何构造代码以便在分页时继续写入HTML。相反，它会覆盖之前编写的html结果。非常感谢您的帮助。谢谢！

while True:
    time.sleep(10)

    golds = driver.find_elements_by_css_selector(".widgetContainer #widgetContent > div.singleCell")
    print("found %d golds" % len(golds))  

    template = """\
        <tr class="border">
            <td class="image"><img src="{0}"></td>\
            <td class="title"><a href="{1}" target="_new">{2}</a></td>\
            <td class="price">{3}</td>
        </tr>"""

    lines = []

    for gold in golds:
        goldInfo = {}

        goldInfo['title'] = gold.find_element_by_css_selector('#dealTitle > span').text
        goldInfo['link'] = gold.find_element_by_css_selector('#dealTitle').get_attribute('href')
        goldInfo['image'] = gold.find_element_by_css_selector('#dealImage img').get_attribute('src')

        try:
            goldInfo['price'] = gold.find_element_by_css_selector('.priceBlock > span').text
        except NoSuchElementException:
            goldInfo['price'] = 'No price display'

        line = template.format(goldInfo['image'], goldInfo['link'], goldInfo['title'], goldInfo['price'])
        lines.append(line)

    try:
        #clicks next button
        driver.find_element_by_link_text("Next→").click()
    except NoSuchElementException:
        break

    time.sleep(10)

    html = """\
        <html>
            <body>
                <table>
                    <tr class='headers'>
                        <td class='image'></td>
                        <td class='title'>Product</td>
                        <td class='price'>Price / Deal</td>
                    </tr>
                </table>
                <table class='data'>
                    {0}
                </table>
            </body>
        </html>\
    """

    f = open('./result.html', 'w')
    f.write(html.format('\n'.join(lines)))
f.close()

Answer 1

在脚本的最后打开文件时，请查看不同的模式：https://docs.python.org/2/library/functions.html#open

最常用的模式值是'r'用于读取，'w'用于写入（截断文件，如果它已经存在），'a'用于追加

然后还有更多

模式'r +'，'w +'和'a +'打开文件进行更新（读写）;请注意'w +'会截断文件。将“b”附加到模式以在二进制模式下打开文件，在区分二进制文件和文本文件的系统上;在没有这种区别的系统上，添加'b'没有效果。

所以你有几个选择。您可以使用a，因为您想要向其追加数据。

或者您可以将文件打开到循环外部，这样您就不会经常重新打开文件，具体取决于您的需求。

f = open('./result.html', 'w')
while True:
  # do stuff
  f.write (...)
f.close()

Answer 2

您应该通过

以附加模式打开文件

f = open('./result.html', 'a')

Python循环覆盖最后的HTML写

2 个答案: