从列表开始,我想查询网页以从每个列表中提取标签。结果应保存在csv文件中。但是,我遇到此错误:ValueError:dict包含不在字段名中的字段。
我的理解是,尽管字典根据列表的不同而具有可变数量的关键字,但是来自查询的字典包含的关键字比字段名称更多。
代码如下:
import csv
from selenium import webdriver
from time import sleep
from parsel import Selector
from selenium.webdriver.common.keys import Keys
from collections import defaultdict
####### reading from the input file ##########
columns = defaultdict(list) # each value in each column is appended to a list
# get the list of keywords from the csv file
with open('query.csv', 'r') as csvfile:
reader = csv.DictReader(csvfile) # read rows into a dictionary format
for row in reader: # read a row as {column1: value1, column2: value2,...}
for (k, v) in row.items(): # go over each column name and value
columns[k].append(v) # append the value into the appropriate list
# the list containing all of the keywords
search_query_list = columns['Keyword']
########## start scraping ###############
rb_results = []
# create a driver and let it open google chrome
driver = webdriver.Chrome("chromedriver")
# get linkedin website
driver.get('https://www.redbubble.com/')
sleep(0.5)
for i in range(len(search_query_list)):
next_query = search_query_list[i]
# get RB website
driver.get('https://www.redbubble.com/')
# get the search by its id
search_bar = driver.find_element_by_name("query")
sleep(0.5)
# enter the query to the search bar
search_bar.send_keys(next_query)
# press enter
search_bar.send_keys(Keys.RETURN)
sleep(1)
# from parsel's selector get the page source
sel1 = Selector(text=driver.page_source)
sleep(0.5)
# first shirt //
continue_link = driver.find_element_by_class_name('shared-components-ShopSearchSkeleton-ShopSearchSkeleton__composedComponentWrapper--1s_CI').click()
sleep(1)
sel2 = Selector(text=driver.page_source)
sleep(0.5)
################## get TAGS ###############
# Check tags for all products
try:
# get the tags for the search query
tags_rb = driver.find_element_by_class_name("shared-components-Tags-Tags__listContent--oLdDf").extract_first()
# if number of products is found print it and search for the prime
# print the number of products found
if tags_rb == None:
rb_results.append("0")
else:
rb_results = str(tags_rb)
except:
rb_results.append("error")
###### writing part ########
with open ("rb_results.csv","w", newline='') as resultFile:
writer = csv.DictWriter(resultFile, fieldnames=["Rb Results"],delimiter='\t')
writer.writeheader()
writer.writerows({'Rb Tags': item} for item in rb_results)
resultFile.close()
如何解决以下错误消息:ValueError:dict包含不在字段名“ Rb标签”中的字段?
非常感谢!