我有一个像这样的大字符串:
SEND OK HTTP/1.1 200 OK
Access-Control-Allow- l-Allow-Methods: GET,POST,DELETE
Access-Control-Allow-Headers: X-Requested-With,
Phant-Private-Key Content-Type: text/plain X-Rate-Limit-Limit: 300
X-Rate-Limit-Remaining: 297
X-Rate-Limit-Reset: 1452931335.777
Date: Sat, 16 Jan 2016 07:50:17 GMT
Set-Cookie: SERVERID=; Expires=Thu, 01-Jan-197 0 00:00:01 GMT; path=/ Cache-control: private
Transfer-Encoding: chunked
它包含Sat, 16 Jan 2016 07:50:17 GMT
之类的字符串。该字符串可能是任何时候。我希望从整体上得到这个字符串。我知道这是一个非常基本的问题,但我怎么能在python中做到这一点。
并不总是字符串包含像Date:
这样的子字符串。
答案 0 :(得分:1)
使用
import re
datepattern = re.compile("\w{3}, \d{2} \w{3} \d{4} \d{2}:\d{2}:\d{2} \w{3}")
matcher = datepattern.search(string_to_match_against)
print(matcher.group(0))
使用您的示例
string_to_match_against = """
SEND OK HTTP/1.1 200 OK
Access-Control-Allow- l-Allow-Methods: GET,POST,DELETE
Access-Control-Allow-Headers: X-Requested-With,
Phant-Private-Key Content-Type: text/plain X-Rate-Limit-Limit: 300
X-Rate-Limit-Remaining: 297
X-Rate-Limit-Reset: 1452931335.777
Date: Sat, 16 Jan 2016 07:50:17 GMT
Set-Cookie: SERVERID=; Expires=Thu, 01-Jan-197 0 00:00:01 GMT; path=/ Cache-control: private
Transfer-Encoding: chunked
"""
我们会打印
Sat, 16 Jan 2016 07:50:17 GMT
看起来您正在尝试匹配http标头,并且(根据" HTTP:Pocket Reference",O' Reilly,2000)Date标头可以使用三种格式的日期:
import re
pat1123 = "\w{3}, \d{2} \w{3} \d{4} \d{2}:\d{2}:\d{2} \w{3}"
pat1036 = "\w+?, \d{2}-\w{3}-\d{2} \d{2}:\d{2}:\d{2} \w{3}"
patc = "\w{3} \w{3} \d+? \d{2}:\d{2}:\d{2} \d{4}"
datepattern = re.compile("(?:%s)|(?:%s)|(?:%s)"%(pat1123,pat1036,patc))
matcher = datepattern.search(string_to_match_against)
print(matcher.group(0))
请注意,此方法不依赖于任何存在的内容除了要提取的日期(我们不需要日期:文本)。如果出现多个此类日期,则会找到第一个日期。如果需要多个datepattern.findall
,请使用{{1}}。
答案 1 :(得分:0)
使用您提供的样本,您可以这样处理;
import re
s = """
> SEND OK HTTP/1.1 200 OK
> Access-Control-Allow- l-Allow-Methods: GET,POST,DELETE
> Access-Control-Allow-Headers: X-Requested-With,
> Phant-Private-Key Content-Type: text/plain X-Rate-Limit-Limit: 300
> X-Rate-Limit-Remaining: 297
> X-Rate-Limit-Reset: 1452931335.777
> Date: Sat, 16 Jan 2016 07:50:17 GMT
> Set-Cookie: SERVERID=; Expires=Thu, 01-Jan-197 0 00:00:01 GMT; path=/ Cache-control: private
> Transfer-Encoding: chunked
"""
pat = re.compile(r'Date:([\s\w,:]+)')
print pat.search(s).group(1).strip()
输出:
'Sat, 16 Jan 2016 07:50:17 GMT'
答案 2 :(得分:0)
使用requests模块
import requests
r = requests.get('http://www.google.com')
if r.status_code == 200:
print(r.headers['date'])
答案 3 :(得分:-1)
如果每个设置/属性(例如X-Rate-Limit-Reset
,Date
等)始终存在于字符串中并且始终以相同的顺序排列,那么您可以split()
几次:
>>> mystring.split('Date: ')[1].split('>')[0].strip()
'Sat, 16 Jan 2016 07:50:17 GMT'
如果不是,您可以创建一个简单的正则表达式来查找特定的行:
>>> re.search(r'Date:\s*(.*?)\s*>', mystring).group(1)
'Sat, 16 Jan 2016 07:50:17 GMT'