我有一个python代码,可从足球结果和赔率网站上抓取一页
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
import pandas as pd
import copy
import numpy as np
results = []
d = webdriver.Chrome(executable_path = r'C:\chromedriver_win32\chromedriver.exe')
u = 'https://1x2.lucksport.com/result_en.shtml?dt=' + '2019-05-02' + '&cid=156'
d.get(u)
WebDriverWait(d, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#odds_tb tr[class]")))
soup = bs(d.page_source, 'lxml')
rows = soup.select('#odds_tb tr[class]')
headers = ['Comp', 'Time', 'Match' ,'Odds', 'H', 'A', 'Res']
i = 1
for row in rows[1:]:
cols = [td.text for td in row.select('td')]
if (i % 2 == 1):
record = {'Comp' : cols[0],
'Time' : cols[1],
'Match' : ' v '.join([cols[2], cols[6]]),
'Odds' : 'op',
'H' : cols[3],
'A' : cols[5],
'Res' : cols[7]}
else:
record['Odds'] = 'cl'
record['H'] = cols[0]
record['A'] = cols[2]
results.append(copy.deepcopy(record))
i+=1
df = pd.DataFrame(results, columns = headers)
d.quit()
我想创建一个循环并刮取所有以前的日期(对于特定的日期范围(例如上个月)),因此我创建了一个日期列表以在循环中使用它:
D = datetime.datetime.now().date()
date_list = [D - datetime.timedelta(days=x) for x in range(0, 30)]
dates = []
for i in date_list:
date = str(i)
dates.append(date)
然后我尝试创建一个循环,希望该循环返回所有先前日期数据的数据框
results = []
for date in dates:
d = webdriver.Chrome(executable_path = r'C:\chromedriver_win32\chromedriver.exe')
u = 'https://1x2.lucksport.com/result_en.shtml?dt=' + date + '&cid=156'
i = 1
d.get(u)
WebDriverWait(d, 20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#odds_tb tr[class]")))
soup = bs(d.page_source, 'lxml')
rows = soup.select('#odds_tb tr[class]')
headers = ['Comp', 'Time', 'Match' ,'Odds', 'H', 'A', 'Res', 'Date']
for row in rows[1:]:
cols = [td.text for td in row.select('td')]
if (i % 2 == 1):
record = {'Comp' : cols[0],
'Time' : cols[1],
'Match' : ' v '.join([cols[2], cols[6]]),
'Odds' : 'op',
'H' : cols[3],
'A' : cols[5],
'Res' : cols[7],
'Date' : date}
else:
record['Odds'] = 'cl'
record['H'] = cols[0]
record['A'] = cols[2]
results.append(copy.deepcopy(record))
i+=1
d.quit()
df = pd.DataFrame(results, columns = headers)
但它返回错误
TypeError Traceback (most recent call last)
<ipython-input-6-0668d7389fc6> in <module>
33 cols = [td.text for td in row.select('td')]
34
---> 35 if (i % 2 == 1):
36 record = {'Comp' : cols[0],
37 'Time' : cols[1],
TypeError: unsupported operand type(s) for %: 'datetime.date' and 'int'
答案 0 :(得分:2)
i
的类型为datetime.date
,原因是
D = datetime.datetime.now().date()
date_list = [D - datetime.timedelta(days=x) for x in range(0, 30)]
dates = []
for i in date_list:
date_list
是datetime.date
的列表,因此i
将是此列表中该类型的元素。
您稍后尝试将其视为int
;因此就是你的错误。
if (i % 2 == 1):
例如,在迭代i
时使用另一个循环计数器变量或更改date_list
import datetime
D = datetime.datetime.now().date()
date_list = [D - datetime.timedelta(days=x) for x in range(0, 30)]
dates = []
i = 1
for iDate in date_list:
if (i % 2 == 1):
print(i)
i+=1
旁注:
您的d.quit()
在For循环内并且可以在它之后,而d = webdriver.Chrome(executable_path = r'C:\chromedriver_win32\chromedriver.exe')
可以在循环之前。然后您从头到尾只使用一个实例。