Question

我有以下字符串：

s = '2014 2026 202 20 1000 1949 194 195092 20111a a2011a a2011 keep this text n0t th1s th0ugh 1 0 2015 2025 2026'

我想用''替换此字符串的每个部分，其中包含一个数字，但该字符串中1950年至2025年的部分除外。结果字符串看起来像这样（不要担心无关的空格）：

'2014          keep this text      2015 2025 '

所以，实际上，我希望远程蛮力去除任何东西和所有东西＆＃34;数字，＆＃34;除了独立（即不是另一个字符串的一部分，长度为4，不包括空格），类似于一年。

我知道我可以用它来删除包含数字的所有：

re.sub('\w*[0-9]\w*', '', s)

但这并不能归还我想要的东西：

'           keep this text        '

我试图替换任何与下列模式不匹配的内容：

re.sub(r'^([A-Za-z]+|19[5-9]\d|20[0-1]\d|202[0-5])', '*', s)

返回：

'* 2026 202 20 1000 1949 194 195092 20111a a2011a a2011 keep this text n0t th1s th0ugh 1 0 2015 2025 2026'

我已经here和here，但无法找到我要找的内容。

Answer 1

正则表达式不擅长使用数字。我会抛弃正则表达式并使用生成器表达式：

predicate= lambda w: (w.isdigit() and 1950<=int(w)<=2025) or not any(char.isdigit() for char in w)
print(' '.join(w for w in s.split() if predicate(w)))

Answer 2

我会这样做，因为它可读且易于修复以改进：

' '.join(
    filter(
        lambda word: (word.isdigit() and \
                      int(word) >= 1950 and \
                      int(word) <= 2025) or \
                     re.match(r'^[a-zA-Z]+$', word),
        s.split()
    )
)
# '2014 keep this text 2015 2025'

Answer 3

使用re.findall()函数的简短解决方案：

s = '2014 2026 202 20 1000 1949 194 195092 20111a a2011a a2011 keep this text n0t th1s th0ugh 1 0 2015 2025 2026'
result = ''.join(re.findall(r'\b(19[5-9][0-9]|20[01][0-9]|202[0-5]|[a-z]+|[^0-9a-z]+)\b', s, re.I))

print(result)

输出：

2014           keep this text      2015 2025

正则表达式：替换所有数字和＆＃34;数字＆＃34;字符串除了范围内的年份

3 个答案: