我正在使用 Selenium 在 Python3 中运行脚本。我得到的输出与预期的一样。现在,我想将输出保存到文本或csv
或json
文件中。当我尝试运行脚本并将结果保存到文件时,遇到open('bangkok_vendor.txt','a')
为wt
的错误:
TypeError:“ NoneType”对象不可调用
这意味着程序中的循环仅运行一次,并且不将数据存储在名为bangkok_vendor.txt
的文件中。在普通的python scraper程序中,存储数据不会有任何问题,但这是我第一次使用selenium
。您能帮我解决问题吗?
我正在尝试从终端命令运行此脚本,并且将输出内容保存为任何文件格式:
from selenium import webdriver
from bs4 import BeautifulSoup as bs
import csv
import requests
contents =[]
filename = 'link_business_filter.csv'
def copy_json():
with open("bangkok_vendor.text",'w') as wt:
for x in script2:
wt.writer(x)
wt.close()
with open(filename,'rt') as f:
data = csv.reader(f)
for row in data:
links = row[0]
contents.append(links)
for link in contents:
url_html = requests.get(link)
print(link)
browser = webdriver.Chrome('chromedriver')
open = browser.get(link)
source = browser.page_source
data = bs(source,"html.parser")
body = data.find('body')
script = body
x_path = '//*[@id="react-root"]/section/main/div'
script2 = browser.find_element_by_xpath(x_path)
script3 = script2.text
#script2.send_keys(keys.COMMAND + 't')
browser.close()
print(script3)
答案 0 :(得分:0)
script2
作为copy_json
函数的参数,并在从页面提取数据时调用它。 w
仪式模式更改为a
暂停,否则每次您调用copy_json
函数时文件都会重置。open
之类的内置函数,否则一旦进行第二次迭代,您将无法打开文件来写入数据。我对您的代码进行了一些重构:
LINK_CSV = 'link_business_filter.csv'
SAVE_PATH = 'bangkok_vendor.txt'
def read_links():
links = []
with open(LINK_CSV) as f:
reader = csv.reader(f)
for row in reader:
links.append(row[0])
return links
def write_data(data):
with open(SAVE_PATH, mode='a') as f:
f.write(data + "\n")
if __name__ == '__main__':
browser = webdriver.Chrome('chromedriver')
links = read_links()
for link in links:
browser.get(link)
# You may have to wait a bit here
# until the page is loaded completely
html = browser.page_source
# Not sure what you're trying to do with body
# soup = BeautifulSoup(html, "html.parser")
# body = soup.find('body')
x_path = '//*[@id="react-root"]/section/main/div'
main_div = browser.find_element_by_xpath(x_path)
text = main_div.text
write_data(text)
# close browser after every link is processed
browser.quit()