使用Python在文本文件中在运行时保存硒结果/硒输出

时间:2019-07-04 06:35:36

标签: python-3.x selenium save

我正在使用 Selenium Python3 中运行脚本。我得到的输出与预期的一样。现在,我想将输出保存到文本或csvjson文件中。当我尝试运行脚本并将结果保存到文件时,遇到open('bangkok_vendor.txt','a')wt的错误:

  

TypeError:“ NoneType”对象不可调用

这意味着程序中的循环仅运行一次,并且不将数据存储在名为bangkok_vendor.txt的文件中。在普通的python scraper程序中,存储数据不会有任何问题,但这是我第一次使用selenium。您能帮我解决问题吗?

我正在尝试从终端命令运行此脚本,并且将输出内容保存为任何文件格式:

from selenium import webdriver
from bs4 import BeautifulSoup as bs
import csv
import requests

contents =[]

filename = 'link_business_filter.csv'

def copy_json():
    with open("bangkok_vendor.text",'w') as wt:
        for x in script2:
            wt.writer(x)
            wt.close()

with open(filename,'rt') as f:
    data = csv.reader(f)
    for row in data:
        links = row[0]
        contents.append(links)

for link in contents:
    url_html = requests.get(link)
    print(link)
    browser = webdriver.Chrome('chromedriver')
    open = browser.get(link)
    source = browser.page_source
    data = bs(source,"html.parser")
    body = data.find('body')
    script = body
    x_path = '//*[@id="react-root"]/section/main/div'
    script2 = browser.find_element_by_xpath(x_path)
    script3 = script2.text

    #script2.send_keys(keys.COMMAND + 't')
    browser.close()
    print(script3)

1 个答案:

答案 0 :(得分:0)

  • 您需要传递script2作为copy_json函数的参数,并在从页面提取数据时调用它。
  • w仪式模式更改为a暂停,否则每次您调用copy_json函数时文件都会重置。
  • 请勿覆盖open之类的内置函数,否则一旦进行第二次迭代,您将无法打开文件来写入数据。

我对您的代码进行了一些重构:

LINK_CSV = 'link_business_filter.csv'
SAVE_PATH = 'bangkok_vendor.txt'


def read_links():
    links = []
    with open(LINK_CSV) as f:
        reader = csv.reader(f)
        for row in reader:
            links.append(row[0])
    return links


def write_data(data):
    with open(SAVE_PATH, mode='a') as f:
        f.write(data + "\n")


if __name__ == '__main__':
    browser = webdriver.Chrome('chromedriver')

    links = read_links()
    for link in links:
        browser.get(link)

        # You may have to wait a bit here 
        # until the page is loaded completely

        html = browser.page_source

        # Not sure what you're trying to do with body 
        # soup = BeautifulSoup(html, "html.parser")
        # body = soup.find('body')

        x_path = '//*[@id="react-root"]/section/main/div'
        main_div = browser.find_element_by_xpath(x_path)
        text = main_div.text

        write_data(text)

    # close browser after every link is processed
    browser.quit()