我的抓取工作似乎只是在网页的最后一页写入CSV。我假设这是因为它循环遍历所有页面然后写入csv。它会刮掉元素并在控制台中打印它们。您是否必须立即循环并写入每个页面的csv,因为它无法存储数据?我已经尝试调整我的代码以适应这一点,但我似乎无法让它工作。
提前致谢。
我也试过了一个不同的方法,但同样的事情似乎发生在https://www.pastebin.ca/3863340
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import csv
import requests
import time
from selenium import webdriver
from random import shuffle
import csv
driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()
driver.get('https://www.bookmaker.com.au/sports/soccer/')
SCROLL_PAUSE_TIME = 0.5
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(SCROLL_PAUSE_TIME)
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
time.sleep(1)
elements = driver.find_elements_by_css_selector(".market-match:nth-child(2) .market-group a , .market-match:nth-child(1) .market-group a")
elem_href1 = [element.get_attribute("href") for element in elements]
print(elem_href1)
print (len(elem_href1))
shuffle(elem_href1)
for link in elem_href1:
driver.get(link)
...
time.sleep(2)
# link
elems = driver.find_elements_by_css_selector("h3 a[Href*='/sports/soccer']")
elem_href = []
for elem in elems:
print(elem.get_attribute("href"))
elem_href.append(elem.get_attribute("href"))
# TEAM
langs = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
langs_text = []
for lang in langs:
print(lang.text)
langs_text.append(lang.text)
time.sleep(0)
# odds
langs1 = driver.find_elements_by_css_selector("a.odds.quickbet")
langs1_text = []
for lang in langs1:
print(lang.text)
langs1_text.append(lang.text)
time.sleep(0)
with open('vtg12.csv', 'a', newline='') as outfile:
writer = csv.writer(outfile)
for row in zip(langs1_text, langs_text, elem_href):
writer.writerow(row)
答案 0 :(得分:2)
问题在于您每次迭代都会覆盖CSV,因此只有在脚本结束时才会保留最后一条记录。
更改
with open('vtg12.csv', 'a', newline='') as outfile:
writer = csv.writer(outfile)
for row in zip(langs1_text, langs_text, elem_href):
writer.writerow(row)
到
with open('vtg12.csv', 'a+', newline='') as outfile:
writer = csv.writer(outfile)
for row in zip(langs1_text, langs_text, elem_href):
writer.writerow(row)
a+
将以附加模式打开文件
答案 1 :(得分:2)
在最顶端:
def append_to_csv(csv_list, output_filename):
with open(output_filename, 'a', newline='') as fp:
a = csv.writer(fp)
data = [csv_list]
a.writerows(data)
然后替换
with open('vtg12.csv', 'a', newline='') as outfile:
writer = csv.writer(outfile)
for row in zip(langs1_text, langs_text, elem_href):
writer.writerow(row)
使用:
for row in zip(langs_text, langs2_text, langs_text, elem_href):
append_to_csv(row, 'vtg12.csv')