选择子字符串的正则表达式

时间:2019-01-23 05:24:31

标签: python regex awk sed

我正在尝试编写正则表达式,它将选择字符串的某些部分而忽略其余部分

我有一个下面的文本,希望正则表达式从所有行中提取字符串““”要选择的文本“”“(忽略空格),并忽略其余字符串

"""Text to be selected"""
""" Text to be selected """
""" Text to be selected Text not to be selected"""
Text not to be selected """ Text to be selected Text not to be selected"""

我尝试遵循正则表达式

[\s]?"""[\s]|[\S]Text to be selected[\s]|[\S].*"""

但是它选择了所有结尾处带有“。*”的字符串。

返回字符串

"""Text to be selected"""
""" Text to be selected """
""" Text to be selected Text not to be selected"""
Text not to be selected """ Text to be selected Text not to be selected"""

但是我需要字符串

"""Text to be selected"""
""" Text to be selected """
""" Text to be selected """
""" Text to be selected """

4 个答案:

答案 0 :(得分:2)

使用sed:

sed -E 's/[^"]*(""" ?Text to be selected ?)[^"]*(""").*/\1\2/' file

说明:

  • [^"]*:搜索非引号字符
  • (""" ?Text to be selected ?):捕获""",后跟可选空格和匹配文本
  • [^"]*:搜索零个或多个非引号字符
  • ("""):捕获结尾为"""
  • \1\2:输出捕获的文本并以"""结尾

答案 1 :(得分:1)

请您尝试以下。

awk '
/^\"\"\".*\"\"\"$/{
  if(match($0,/Text to be selected/)){
    print substr($0,1,3),substr($0,RSTART,RLENGTH),substr($0,length($0)-2)
  }
}'   Input_file

答案 2 :(得分:0)

我检查了您的具体情况,这可行:

def matchme(string):
    match = (re.match('.*("""\s*Text to be selected\s*).*(""").*',string))
    if (match is not None):
        return match[1]+match[2]
    else:
        return ''

这个想法是“抓住”比赛,跳过其余的比赛,然后重建字符串。希望这对您来说足够普遍

答案 3 :(得分:0)

尝试Perl

$ cat mahajan.txt
"""Text to be selected"""
""" Text to be selected """
""" Text to be selected Text not to be selected"""
Text not to be selected """ Text to be selected Text not to be selected"""

$  perl -lne ' /("""\s*Text to be selected)(.+?)?(""")/ and print "$1$3" ' mahajan.txt
"""Text to be selected"""
""" Text to be selected"""
""" Text to be selected"""
""" Text to be selected"""

$