我如何解析foll。在python中提取年份:
'years since 1250-01-01 0:0:0'
答案应该是1250
答案 0 :(得分:11)
有各种各样的方法,这里有几个选项:
dateutil
parser处于“模糊”模式:
SELECT *
FROM
(SELECT rownum AS rn,
a.*
FROM
(WITH DATA AS -- creating dummy data
( SELECT 'MOHAN' AS NAME, 200 AS SALARY FROM DUAL
UNION ALL
SELECT 'AKSHAY' AS NAME, 500 AS SALARY FROM DUAL
UNION ALL
SELECT 'HARI' AS NAME, 300 AS SALARY FROM DUAL
UNION ALL
SELECT 'RAM' AS NAME, 400 AS SALARY FROM DUAL
)
SELECT D.* FROM DATA D ORDER BY SALARY DESC
) A
)
WHERE rn = 3; -- specify N'th highest here (In this case fetching 3'rd highest)
带有捕获组的正则表达式:
In [1]: s = 'years since 1250-01-01 0:0:0'
In [2]: from dateutil.parser import parse
In [3]: parse(s, fuzzy=True).year # resulting year would be an integer
Out[3]: 1250
按“自”拆分,然后用破折号分开:
In [2]: import re
In [3]: re.search(r"years since (\d{4})", s).group(1)
Out[3]: '1250'
或者甚至可以通过第一个破折号分割并切割第一个子字符串:
In [2]: s.split("since", 1)[1].split("-", 1)[0].strip()
Out[2]: '1250'
最后两个涉及更多“移动部件”,可能不适用,具体取决于输入字符串的可能变化。
答案 1 :(得分:5)
您可以使用带有四位数字捕获组的正则表达式,同时还要确保周围有特定的图案。我可能会寻找一些东西:
4位数和捕获In [2]: s.split("-", 1)[0][-4:]
Out[2]: '1250'
连字符(\d{4})
两位数-
连字符\d{2}
两位数-
捐赠:\d{2}
演示:
(\d{4})-\d{2}-\d{2}
如果您需要它作为int,只需将其转换为:
>>> import re
>>> d = re.findall('(\d{4})-\d{2}-\d{2}', 'years since 1250-01-01 0:0:0')
>>> d
['1250']
>>> d[0]
'1250'
答案 2 :(得分:2)
以下正则表达式应该将四位数年份作为第一个捕获组:
^.*\(d{4})-\d{2}-\d{2}.*$