我用python与硒结合编写了一个脚本,以解析网页中的某些动态内容,并将其相应地写入csv文件。除了 the date
以外,以下脚本可以无错误地执行此操作。
如果查看该站点的内容,可以看到该表格数据中没有提到年份。
但是,当我单击输出文件中Date
列标题下的任何单元格时,默认情况下excel将其计为当前年份,而the date
应该是2004
。如何根据下图2中显示的内容来定年2004
?
我正在尝试的脚本:
import csv
import datetime
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "http://info.nowgoal.com/en/League/2004-2005/36.html"
def get_information(driver,link):
driver.get(link)
for items in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR,'table#Table3 tr')))[2:]:
try:
date = items.find_elements_by_css_selector("td")[1].text.split("\n")[0]
date = datetime.datetime.strptime(date, '%m-%d').strftime('%d-%B')
except Exception: date = ""
try:
match_name = items.find_elements_by_css_selector("td")[2].find_element_by_tag_name("a").text
except Exception: match_name = ""
writer.writerow([date,match_name])
print(date,match_name)
if __name__ == '__main__':
driver = webdriver.Chrome()
wait = WebDriverWait(driver,10)
with open("outputfile.csv","w",newline="") as infile:
writer = csv.writer(infile)
writer.writerow(['Date','Match name'])
try:
get_information(driver,url)
finally:
driver.quit()
这是您在该网页上看到的 :
答案 0 :(得分:1)
您可以按如下所示将正确的年份添加到单元格:
import datetime
date = "05-15"
date = datetime.datetime.strptime(date, '%m-%d').replace(year=2004).strftime('%d-%B-%Y')
print(date)
这将显示:
15-May-2004