如何从python中的字符串中取出日期值?

时间:2015-09-25 07:44:30

标签: python string python-2.7 date python-3.x

我从网址获取值。

import urllib2
response = urllib2.urlopen('url')    
response.read()

它给了我太长的字符串类型输出,但我只是把我的问题放在这里。

STRING TYPE OUTPUT:

'<p>Dear Customer,</p>
<p>This notice serves as proof of delivery for the shipment listed below.</p>
<dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
<dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
<dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
<dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
<dt><label>Left At:</label></dt>
<dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'

问题:

我如何约会(2015年9月14日上午11:07)为 Delivered On 分配?

5 个答案:

答案 0 :(得分:6)

您可以先使用Beautiful Soup或其他一些html解析器。它可能看起来像这样:

from bs4 import BeautifulSoup
import urllib2
response = urllib2.urlopen('url')    
html = response.read()
soup = BeautifulSoup(html)
datestr = soup.find("label", text="Delivered On:").find_parent("dt").find_next_sibling("dd").string

如果您需要,一旦掌握了日期字符串,就可以使用strptime将其转换为日期时间对象。

import datetime
date = datetime.datetime.strptime(datestr, "%mm/%dd/%Y %I:%M %p")

请记住 - 您通常不会发现自己使用正则表达式解析HTML或XML ...

答案 1 :(得分:1)

试试这段代码:

import re

text = '''<p>Dear Customer,</p>
          <p>This notice serves as proof of delivery for the shipment listed below.</p>
          <dl class="outHozFixed clearfix"><label>Weight:</label></dt>
          <dd>18.00 lbs</dd>
          <dt><label>Shipped&#047;Billed On:</label></dt>
          <dd>09/11/2015</dd>
          <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
          <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
          <dt><label>Left At:</label></dt>
          <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

re.findall(r'<dt><label>Delivered On:<\/label><\/dt><dd>([0-9\.\/\s:APM]+)', text)

输出:

['09/14/2015 11:07 A.M.']

答案 2 :(得分:1)

仅基于该输出,我会使用rere.search。创建一个用于查找时间日期的正则表达式,如下所示:

import re

output = '''<p>Dear Customer,</p>
            <p>This notice serves as proof of delivery for the shipment listed below.</p>
            <dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd>
            <dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd>
            <dt><label>Delivered On:</label></dt><dd>09/14/2015 11:07 A.M.</dd>
            <dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt>
            <dt><label>Left At:</label></dt>
            <dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>'''

pattern = '\d{2}/\d{2}/\d{4} \d{1,2}:\d{2} [A|P]\.M\.'

result = re.search(pattern, text, re.MULTILINE).group(0)

答案 3 :(得分:1)

如果你不喜欢正则表达式和第三方库,你总是可以使用老式的硬编码单行解决方案:

start_index = input_text.index("Delivered On:")+len("Delivered On:</label></dt><dd>")
stop_index = start_index + 21
text_date = input_text[start_index:stop_index]

对于一行案例:

{{1}}

因为您的问题的任何解决方案都是不同类型的硬编码:(

答案 4 :(得分:1)

试试这段代码:

import re
a = """<p>Dear Customer,</p><p>This notice serves as proof of delivery for the shipment listed below.</p><dl class="outHozFixed clearfix"><label>Weight:</label></dt><dd>18.00 lbs</dd><dt><label>Shipped&#047;Billed On:</label></dt><dd>09/11/2015</dd><dt><label>Delivered On:</label></dt><dd>12/4/2015 11:07 A.M.</dd><dt><label for="">Signed By:</label></dt><dd>Odedra</dd></dt><dt><label>Left At:</label></dt><dd>Office</dd></dl><p>Thank you for giving us this opportunity to serve you.</p>"""
data = re.search('Delivered On:</label></dt><dd>(.*)$',a)
if data and data.group(1)[:1].isdigit(): 
    data.group(1)[:20]