Question

我有几行看起来像这样：

\_

我正在尝试检索第一个import re [re.sub(".+\_(.+)","\g<1>",gg) for gg in test]之后的所有内容。在我的正则表达式中我正在尝试

 ['ILL_BE_BACK','HASTA_LA_VISTA_BABY','SARAH','CONNOR']

但这会截断到最后一个字。我怎么才能得到

Employee

Answer 1

你必须让第一个+不贪心：

[re.sub(".+?\_(.+)","\g<1>",gg) for gg in test] # note the ?

返回：

['ILL_BE_BACK', 'HASTA_LA_VISTA_BABY', 'SARAH', 'CONNOR']

?之后的+使+无贪婪，因此只需根据需要消费：

re.match('.*',"abcdefgh") # finds 'abcdefgh' (the entire string)
re.match('.*?',"abcdefgh") # finds '' (an empty string)
re.match('.+',"abcdefgh") # finds 'abcdefgh' (the entire string)
re.match('.+?',"abcdefgh") # finds 'a' (only the first character)
re.match('.+?f',"abcdefgh") # finds 'abcdef' (all characters till f)

这意味着你的正则表达式.+\_(.+)会消耗掉所有东西，直到他足够匹配剩下的_(.+)，这只是最后一部分。如果您将正则表达式更改为.+?\_(.+)，它将只消耗到第一个_然后停止。

Answer 2

你可以在没有正则表达式的情况下完成。

['_'.join(gg.split('_')[1:]) for gg in test]

编辑：适用于没有_的元素的情况。

['_'.join(gg.split('_')[('_' in gg):]) for gg in test]

Answer 3

您可以在这里使用更简单的正则表达式，甚至是非正则表达式方法：

import re
test=['S123X_ILL_BE_BACK','BA34_HASTA_LA_VISTA_BABY','JA3841_SARAH','J102_CONNOR','SARAH']
print([gg.partition("_")[-1] if "_" in gg else gg for gg in test])

此处，partition方法将在第一个_分割，最后一个项目就是您需要的。如果缺少_，则返回整个项目。

正则表达方式：

print([re.sub(r'^[^_]*_', '', gg) for gg in test])

此处^[^_]*_匹配

^ - 字符串的开头
[^_]* - 除_
_ - _

并删除匹配。

请参阅regex demo。

请参阅this Python demo

与正则表达式部分匹配？

3 个答案: