我正在使用美丽的汤取消日期的网站。这是CSS
<div id="listing-details-list">
<h3 class="listing-details-header">
Details:
<span>Posted on: 14th June 2016</span>
</h3>
</div>
我用来获取日期的代码是
# date
try:
date=soup.find("h3","listing-details-header")
date_result= str(date.get_text().encode("utf-8").strip()[20:])
print "\nPublished date: ", date_result
except StandardError as e:
date_result="Error was {0}".format(e)
print date_result
我得到的结果是作为字符串的日期。一些样本
23rd June 2016
21st July 2016
20th July 2016
3rd July 2016
现在,我希望日期是一个正确的日期,格式为'type',如下所示,以便我可以对其进行计算
23/6/2016
21/7/2016
20/7/2016
3/7/2016
我的代码中哪种方法最适合获得所需的日期?
我希望以这样的方式保存日期:
Month= 6
Day = 23
Year = 2016
我尝试了标记为最佳答案的解决方案
try:
date=soup.find("h3","listing-details-header")
date_result= str(date.get_text().encode("utf-8").strip()[20:])
date_result=parse(date_result) #added
month = date_result.month
day = date_result.day
year = date_result.year
print month
print day
print year
print "\nPublished date: ", date_result
except StandardError as e:
date_result="Error was {0}".format(e)
print date_result
答案 0 :(得分:6)
要解析日期,我会让dateutil
parser完成工作:
>>> from dateutil.parser import parse
>>> l = ["23rd June 2016", "21st July 2016", "20th July 2016", "3rd July 2016"]
>>> for item in l:
... parse(item)
...
datetime.datetime(2016, 6, 23, 0, 0)
datetime.datetime(2016, 7, 21, 0, 0)
datetime.datetime(2016, 7, 20, 0, 0)
datetime.datetime(2016, 7, 3, 0, 0)
您可以使用datetime
个实例进行日期或时间相关的计算。
我还会改进您在页面上找到所需元素的方式并提取日期:
from dateutil.parser import parse
from bs4 import BeautifulSoup
data = """
<div id="listing-details-list">
<h3 class="listing-details-header">
Details:
<span>Posted on: 14th June 2016</span>
</h3>
</div>"""
soup = BeautifulSoup(data, "html.parser")
for item in soup.find_all("span", text=lambda text: text and text.startswith("Posted on:")):
date_string = item.get_text().split(": ")[-1]
print(parse(date_string))