Question

背景

我的问题的背景：查找所有大写/小写的所有mA单元。为了尽可能多地提示用户在周围被误用为ma / Ma / MA的角色，以便用户可以轻松地进行搜索和定位。

我们知道mA是用于电流的有效单位。为简单起见，我们仅使用整数，因此文本中的每一行

case 1, only number and unit: 1mA
case 2, number and unit, space: 1mA current
case 3, number and unit, punctuation: 1mA,
case 4, number and unit, Unicode characters: 1mA电流I

是有效的表达式。

但是

case 5, 1mAcurrent

应该是无效的表达式，因为不允许任何英文字母跟随该单元且没有空格

我的正则表达式正在尝试

那么在这种情况下正确的正则表达式是什么？我在以下文字中使用了每一行

case 5 is taken as a right one, this is wrong      \d{1,}mA
case 4 is ignored                                  \d{1,}mA\b
case 4 is ignored                                  \d{1,}mA[^a-zA-Z]*\b

正如您所读，没有一个是正确的。

我的复杂代码

这是我正在使用的python代码，您会发现我使用python的if-else

import re
text = '''
case 1, only number and unit: 1mA
case 2, number and unit, space: 2mA current
case 3, number and unit, punctuation: 3mA,
case 4, number and unit, Unicode characters: 4mA电流I   
case 5, 5mAcurrent
'''
lst = text.split('\n')
lst = [i for i in lst if i]

pattern = r'(?P<QUANTITY>\d{1,}mA)(?P<TAIL>.{0,5})'

for text in lst:
    for match in re.finditer(pattern, text):    
        if not re.match('[a-zA-Z]', match.group('TAIL')): # extra line
            print(match.group('QUANTITY'), ', ', match.group('TAIL'))

输出

1mA ,  
2mA ,   curr
3mA ,  ,
4mA ,  电流I

很明显，我没有考虑到错误的表达case 5, 5mAcurrent

寻求帮助

有没有一种简便的方法可以以一个正则表达式模式实现它？谢谢

Answer 1

在单元后面使用负前瞻，这将检查是否没有alpha：

pattern = r'(?P<QUANTITY>\d+mA)(?![a-z])(?P<TAIL>.{0,5})'
#                       here __^^^^^^^^^

代码：

pattern = r'(?P<QUANTITY>\d+mA)(?![a-z])(?P<TAIL>.{0,5})'

for text in lst:
    for match in re.finditer(pattern, text):    
        print(match.group('QUANTITY'), match.group('TAIL'))

Answer 2

您可以尝试使用以下模式进行正则表达式搜索：

\d+mA(?= |current|电流I|,|$)

这将与例如1mA后跟一个空格，单词current，中文术语电流I，逗号或输入的结尾。

input = "Here 1mA also 2mAcurrent and 3mA电流I and 4mA, and also 5mA"
matches = re.findall(r'\d+mA(?= |current|电流I|,|$)', input)
print(matches)

此打印：

['1mA', '2mA', '3mA', '4mA', '5mA']

Answer 3

pattern = r'(?P<value>\d+)(?P<units>mA)(\S+|)'
text = ['1mA','1mA电流I','1mA,','1mAcurrent']

for i,j in enumerate(text):
    match = re.match(pattern,j)
    if match:
        print("Text "+match[0]+" matches with value:"+match['value']+ 
        ' Units:'+match['units'])

上面的代码匹配所有情况，并使用命名组来创建可调用节。有3个小组；我命名了前2个（值和单位）

您可以使用管道分隔将单位扩展到任何其他感兴趣的单位。 \d+的值匹配任何整数

Answer 4

如果我理解正确的问题，我们可能只想收集所需的数字，后跟可选空格和一个mA，这个简单的表达式可以做到这一点：

([0-9]+)(\s+)?(?=mA)

我不确定技术性，但是如果我们有浮点数，则此([0-9]+)也将更改为([0-9.]+)。最后，我们将mA附加到所有捕获的输出。

Demo

Answer 5

pattern = r'(?P<value>\d+)(?P<units>mA)(\s[a-z]+|[\s,]|$)'
pattern2 = r'(?P<value>\d+)(?P<units>mA)([^a-z]\S+)'
text = ['1mA','5mA电流I','1mA,','1mAcurrent','1mA current']

for i,j in enumerate(text):
    match = re.match(pattern,j)
    print(j)
    if match:
        print("Text "+match[0]+" matches with value:"+match['value']+ ' 
        Units:'+match['units'])
    else:
        match = re.match(pattern2,j)
        if match:
            print("Text "+match[0]+" matches with value:"+match['value']+ ' 
            Units:'+match['units'])

此解决方案忽略了情况5。当我们没有在第一个模式上返回匹配项时，使用2个模式和一个else语句。

有条件边界的正则表达式？

5 个答案:

Demo