如何使正则表达式仅匹配整个单词而不分解单词?

时间:2019-06-12 14:54:53

标签: python regex

我正在为文档创建一个缩写表,并且正在使用正则表达式在长字符串(即Word文档)中查找所有缩写。

我正在使用此模式'[A-Z] {2,6}-* [0-9] *'。这样,“ HCFC”和“ HCFC-141”都将匹配。

文档的某些部分全部大写。例如“抽象”。先前的模式将“ ABSTRA”和“ CT”作为两个单独的单词返回。我只想匹配整个单词,并从列表中完全删除“ ABSTRA”和“ CT”。我该怎么做?

PS。我已经尝试过\ b [A-Z] {2,6}-* [0-9] * \ b,但是它不起作用。也许我做错了吗?

PSS Python代码:

pattern = '[A-Z]{2,6}\-*[0-9]*'
abbreviation = re.findall(pattern,text)

是否可以使用re库来处理此问题?

4 个答案:

答案 0 :(得分:0)

我猜测我们的问题可能只是-的可选组,后面是数字,或者我们希望具有单词边界,那么此表达式可能有效:

\b[A-Z]{2,6}(-[0-9]+)?\b

\b([A-Z]{2,6}(-[0-9]+)?)\b

Demo

###Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"\b([A-Z]{2,6}(-[0-9]+)?)\b"

test_str = ("HCFC\n"
    "HCFC-141\n"
    "aaHCFC-141")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

答案 1 :(得分:0)

您可以使用{2,6}并确保使用单词边界\b,以便不存在2个匹配项,一个匹配项ABSTRA,另一个匹配项CT

\b[A-Z]{2,6}(?:-[0-9]+)?\b

Regex demo

在python中:

regex = r"\b[A-Z]{2,6}(?:-[0-9]+)?\b"

如果在此部分-*[0-9]*中,连字符不是可选的,则可以将其变成可选组(?:-[0-9]+)?

如果左侧或右侧不应有任何内容,则可以使用:

(?<!\S)[A-Z]{2,6}-?[0-9]*(?!\S)

请注意,-*将匹配0个或多个连字符,而-?将匹配一个可选的连字符。

Regex demo

答案 2 :(得分:0)

尝试使用r前缀。

wscript.exe

这与摘要不匹配,与HDFC,HDFC-141等匹配。

答案 3 :(得分:0)

With ThisWorkbook.Sheets("Sheet1")
    While (Counter <= 300) 
            Pcounter = .Cells(ACBoxCounter, 2)
        If (Pcounter <> "") Then
            ACounter = ACounter + 1
        End If
            ACBCounter = ACBCounter + 30
    Wend

    While (OverallACounter < ACounter)
    Set objStream = CreateObject("ADODB.Stream")
    objStream.Charset = "iso-8859-1"
    objStream.Open


    ExampleString = .Cells(Row2Counter + 22, 3)
    ChooseM = Split(ExampleString, "-")(1)
    If (ChooseM = "8") 
      M = "II"
      P = 97
      Label = .Cells(Row2Counter, 2)
    ElseIf (ChooseM = "13") Then
      Model = "A II"
      P = 10
      Label = "A6_" & .Cells(Row2Counter, 2)
    ElseIf (ChooseM = "19") Then
      M = "AC1I"
      P = 56
      Label = "A9_" & .Cells(Row2Counter, 2)
    End If
    OverallD = 0
    Overall= 0
    OverallB = 0
    ChooseBoxType = Split(ExampleString, "-")(2)
    If ((StrComp(ChooseB, "1") = 0) Or (StrComp(ChooseB, "1M") = 0)) Then
      BoxInputT= "1 Phase"
    ElseIf ((StrComp(ChooseB, "2") = 0) Or (StrComp(ChooseB, "2M") = 0))  Then
      BoxInput= "2"
    ElseIf ((StrComp(ChooseB ,"3") = 0) Or (StrComp(ChooseBo, "3M") = 0)) Then
      BoxInput= "3"
    End If
 objStream.WriteText ("  <" & .Cells(Row2Counter, 2).Text & ">" & vbLf)
Wend
End With