如果存在特定的模式匹配,我试图编写一个正则表达式来获取行的所有内容。我要在一行中搜索的字符串类似于:
1. 7.2.S.6.4 ANNOTATED DATA
OR
2. 9-2-K-1-4 FILE DATA
OR
3. 2-2.K-4.3 FOLDER DATA
在每种情况下,我都希望得到输出:
我想编写一个正则表达式来查找第一个模式示例:“ 7.2.S.6.4”,然后从该模式之后的行中获取下一个单词。
到目前为止,我尝试过的正则表达式是
\s*(-?\d+(?:\.\d+)?)
,但与.S不匹配。或-K-模式中的一部分。知道如何解决这个问题
答案 0 :(得分:0)
您的用例对我来说有点神秘,但这可能适用于第一次比赛,因为这不是最理想的解决方案:
\s*([-.]?\d+(?:\.\d+)?([-.][A-Z])?)[ ](.*)
答案 1 :(得分:0)
您可以使用这样的正则表达式:
^(\d\.) \S+(.*)
然后从捕获组1和2中获取内容
此外,您可以使用此正则表达式,并以$1$2
作为替换字符串:
^(\d\.) \S+(.*)|.+
示例代码
import re
regex = r"^(\d\.) \S+(.*)|.+"
test_str = ("1. 7.2.S.6.4 ANNOTATED DATA \n"
" OR\n"
"2. 9-2-K-1-4 FILE DATA\n"
" OR\n"
"3. 2-2.K-4.3 FOLDER DATA")
subst = "$1$2"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
答案 2 :(得分:0)
这些表达式可能在这里有用
(?=[0-9]+[.-][0-9]+[.-][A-Z]+[.-][0-9]+[.-][0-9]+).*[0-9]\s(.+)
(?=[0-9]+[.-][0-9]+[.-][A-Z]+[.-][0-9]+[.-][0-9]+).*[0-9]\s+(.+)
此隔间确保我们有正确的图案
(?=[0-9]+[.-][0-9]+[.-][A-Z]+[.-][0-9]+[.-][0-9]+)
在这里,我们将捕获所需的输出
(.+)
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(?=[0-9]+[.-][0-9]+[.-][A-Z]+[.-][0-9]+[.-][0-9]+).*[0-9]\s(.+)"
test_str = ("7.2.S.6.4 ANNOTATED DATA\n"
"9-2-K-1-4 FILE DATA\n"
"2-2.K-4.3 FOLDER DATA")
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
for groupNum in range(0, len(match.groups())):
groupNum = groupNum + 1
print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.