已知字符串后的Python Regex

时间:2015-06-01 04:03:46

标签: python regex

我有一个格式如下的字符串:| birth_date = 22 January 1898 |

我想编写一个查找birth_date的正则表达式,并在birth_date之后获取一个4位数字的数字序列,直到管道符号

4 个答案:

答案 0 :(得分:0)

import re
print re.sub(r'\D', '', "| birth_date = 22 January 1898 |")
# output => 221898

# if you want only the last 4 digits:
print re.sub(r'(\D)', '', "| birth_date = 22 January 1898 |")[-4:]
# output => 1898

答案 1 :(得分:0)

假设您想要年份,并且所有字符串都具有相同的格式,您可以避免使用正则表达式:

test = '| birth_date = 22 January 1898 |'
year = test.split()[-2]
print year

将其扩展为一个函数:

def get_year(input):
    '''Returns year as integer, empty string if invalid input.'''

    output = ''
    if 'birth_date' in input:
        output = input.split()[-2]
        try:
            output = int(output)
        except:
            output = ''
    return output

test = ['| birth_date = 22 January 1898 |',
        '| death_date = 22 January 1898 |',
        '| birth_date = 22 January XXXMLC |',
        '| birth_date = 23 January 1961 |']

for input in test:    
    result = get_year(input)
    if not result:
        result = 'Invalid input'
    print(input, result)

答案 2 :(得分:0)

假设您编写了一个isint函数来检查该数字是否为int

for x in range(0, len(string) - len(str(int))):
    if isint(string[x:x+len(str(int))]):
        print string[x:x + len(str(int))]

答案 3 :(得分:0)

正则表达式可以是这样的:

birth_date\s*=\s*\d{1,2}\s*\w+\s*(\d{4})\s*\|

今年是第1组。

>>> pat = re.compile(r'birth_date\s*=\s*\d{1,2}\s*\w+\s*(\d{4})\s*\|')
>>> print pat.search('| birth_date = 22 January 1898 |').group(1)
1898