我有一个格式如下的字符串:| birth_date = 22 January 1898 |
我想编写一个查找birth_date的正则表达式,并在birth_date之后获取一个4位数字的数字序列,直到管道符号
答案 0 :(得分:0)
import re
print re.sub(r'\D', '', "| birth_date = 22 January 1898 |")
# output => 221898
# if you want only the last 4 digits:
print re.sub(r'(\D)', '', "| birth_date = 22 January 1898 |")[-4:]
# output => 1898
答案 1 :(得分:0)
假设您想要年份,并且所有字符串都具有相同的格式,您可以避免使用正则表达式:
test = '| birth_date = 22 January 1898 |'
year = test.split()[-2]
print year
将其扩展为一个函数:
def get_year(input):
'''Returns year as integer, empty string if invalid input.'''
output = ''
if 'birth_date' in input:
output = input.split()[-2]
try:
output = int(output)
except:
output = ''
return output
test = ['| birth_date = 22 January 1898 |',
'| death_date = 22 January 1898 |',
'| birth_date = 22 January XXXMLC |',
'| birth_date = 23 January 1961 |']
for input in test:
result = get_year(input)
if not result:
result = 'Invalid input'
print(input, result)
答案 2 :(得分:0)
假设您编写了一个isint函数来检查该数字是否为int
for x in range(0, len(string) - len(str(int))):
if isint(string[x:x+len(str(int))]):
print string[x:x + len(str(int))]
答案 3 :(得分:0)
正则表达式可以是这样的:
birth_date\s*=\s*\d{1,2}\s*\w+\s*(\d{4})\s*\|
今年是第1组。
>>> pat = re.compile(r'birth_date\s*=\s*\d{1,2}\s*\w+\s*(\d{4})\s*\|')
>>> print pat.search('| birth_date = 22 January 1898 |').group(1)
1898