我正在尝试从bs4抓取列表中返回最大日期。这是我到目前为止所得到的。
import requests
from datetime import date, datetime, timedelta
from collections import OrderedDict, defaultdict
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
con = requests.get("https://au.investing.com/currencies/aud-usd-historical-data/",
headers={'User-Agent': 'Mozilla/5.0'})
odcon = OrderedDict()
content_page = soup(con.content, 'html.parser')
table = content_page.find('table', {'class': 'genTbl closedTbl historicalTbl'})
cols = [th.text for th in table.select("th")[1:]]
for row in table.select("tbody tr"):
data = [td.text for td in row.select("td")]
data[0] = datetime.strptime(data[0], '%b %d, %Y').strftime('%d/%m/%Y')
print(max(data[0]))
Output looks like this for print(data[0])
13/09/2018
12/09/2018
11/09/2018
10/09/2018
09/09/2018
07/09/2018
06/09/2018
05/09/2018
04/09/2018
03/09/2018
02/09/2018
31/08/2018
30/08/2018
29/08/2018
28/08/2018
27/08/2018
26/08/2018
24/08/2018
23/08/2018
22/08/2018
21/08/2018
20/08/2018
19/08/2018
17/08/2018
16/08/2018
15/08/2018
14/08/2018
13/08/2018
我希望退回/打印此列表中的最大日期。
这可能是一个简单的解决方法。但是我不知道。任何帮助将非常感激。
答案 0 :(得分:0)
该代码如何修改
dateList = []
for row in table.select("tbody tr"):
data = [td.text for td in row.select("td")]
d = datetime.strptime(data[0], '%b %d, %Y').date()
dateList = dateList + [d, ]
print max(dateList)
答案 1 :(得分:0)
对日期列表进行排序后设置日期格式:
from bs4 import BeautifulSoup as bs4
site = requests.get("https://au.investing.com/currencies/aud-usd-historical-data/", headers={'User-Agent': 'Mozilla/5.0'})
content_page = bs4(site.content, 'html.parser')
table = content_page.find('table', {'class': 'genTbl closedTbl historicalTbl'})
cols = [th.text for th in table.select("th")[1:]]
dates = []
for row in table.select("tbody tr"):
data = [td.text for td in row.select("td")]
dates.append(data[0])
dates.sort()
datetime.strptime(max(dates), '%b %d, %Y').strftime('%d/%m/%Y')
输出:
'13/09/2018'
顺便说一句,这里没有使用cols
。