我正在为文档创建一个缩写表,并且正在使用正则表达式在长字符串(即Word文档)中查找所有缩写。
我正在使用此模式'[A-Z] {2,6}-* [0-9] *'。这样,“ HCFC”和“ HCFC-141”都将匹配。
文档的某些部分全部大写。例如“抽象”。先前的模式将“ ABSTRA”和“ CT”作为两个单独的单词返回。我只想匹配整个单词,并从列表中完全删除“ ABSTRA”和“ CT”。我该怎么做?
PS。我已经尝试过\ b [A-Z] {2,6}-* [0-9] * \ b,但是它不起作用。也许我做错了吗?
PSS Python代码:
pattern = '[A-Z]{2,6}\-*[0-9]*'
abbreviation = re.findall(pattern,text)
是否可以使用re库来处理此问题?
答案 0 :(得分:0)
我猜测我们的问题可能只是-
的可选组,后面是数字,或者我们希望具有单词边界,那么此表达式可能有效:
\b[A-Z]{2,6}(-[0-9]+)?\b
或
\b([A-Z]{2,6}(-[0-9]+)?)\b
###Test
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"\b([A-Z]{2,6}(-[0-9]+)?)\b"
test_str = ("HCFC\n"
"HCFC-141\n"
"aaHCFC-141")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
答案 1 :(得分:0)
您可以使用{2,6}
并确保使用单词边界\b
,以便不存在2个匹配项,一个匹配项ABSTRA
,另一个匹配项CT
\b[A-Z]{2,6}(?:-[0-9]+)?\b
在python中:
regex = r"\b[A-Z]{2,6}(?:-[0-9]+)?\b"
如果在此部分-*[0-9]*
中,连字符不是可选的,则可以将其变成可选组(?:-[0-9]+)?
如果左侧或右侧不应有任何内容,则可以使用:
(?<!\S)[A-Z]{2,6}-?[0-9]*(?!\S)
请注意,-*
将匹配0个或多个连字符,而-?
将匹配一个可选的连字符。
答案 2 :(得分:0)
尝试使用r前缀。
wscript.exe
这与摘要不匹配,与HDFC,HDFC-141等匹配。
答案 3 :(得分:0)
With ThisWorkbook.Sheets("Sheet1")
While (Counter <= 300)
Pcounter = .Cells(ACBoxCounter, 2)
If (Pcounter <> "") Then
ACounter = ACounter + 1
End If
ACBCounter = ACBCounter + 30
Wend
While (OverallACounter < ACounter)
Set objStream = CreateObject("ADODB.Stream")
objStream.Charset = "iso-8859-1"
objStream.Open
ExampleString = .Cells(Row2Counter + 22, 3)
ChooseM = Split(ExampleString, "-")(1)
If (ChooseM = "8")
M = "II"
P = 97
Label = .Cells(Row2Counter, 2)
ElseIf (ChooseM = "13") Then
Model = "A II"
P = 10
Label = "A6_" & .Cells(Row2Counter, 2)
ElseIf (ChooseM = "19") Then
M = "AC1I"
P = 56
Label = "A9_" & .Cells(Row2Counter, 2)
End If
OverallD = 0
Overall= 0
OverallB = 0
ChooseBoxType = Split(ExampleString, "-")(2)
If ((StrComp(ChooseB, "1") = 0) Or (StrComp(ChooseB, "1M") = 0)) Then
BoxInputT= "1 Phase"
ElseIf ((StrComp(ChooseB, "2") = 0) Or (StrComp(ChooseB, "2M") = 0)) Then
BoxInput= "2"
ElseIf ((StrComp(ChooseB ,"3") = 0) Or (StrComp(ChooseBo, "3M") = 0)) Then
BoxInput= "3"
End If
objStream.WriteText (" <" & .Cells(Row2Counter, 2).Text & ">" & vbLf)
Wend
End With