正则表达式从字符串中删除“ by”

时间:2018-09-10 02:07:20

标签: python regex

更新2:https://regex101.com/r/bE5aWW/2

更新:到目前为止,https://regex101.com/r/bE5aWW/1/是我可以提出的,但是需要摆脱的帮助。

案例1

\n                                \n                                   by name name\n                                \n                            

案例2

\n                                \n                                   name name\n                                \n     

案例3

by name name

案例4

name name

我想从上述字符串中选择名称部分,即name name。我想到的一个是(?:by)? ([\w ]+)前没有空格时,by不能工作。

谢谢

regex101中的代码

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?:by)? ([\w ]+)"

test_str = ("\\n                                \\n                                   by Ally Foster\\n                                \\n                            \n\n"
    "\\n                                \\n                                   Ally Foster\\n                                \\n                            \n\n"
    "by name name\n\n"
    "name name")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

2 个答案:

答案 0 :(得分:1)

(?:by )?(\b(?!by\b)[\w, ]+\S)

我的最终版本也不会选择仅包含by的字符串

答案 1 :(得分:0)

我建议使用

re.findall(r'\b(?!by\b)[^\W\d_]+(?: *(?:, *)?[^\W\d_]+)*', s)

请参见regex demo。在Python 2中,您将需要传递re.U标志以使所有速记字符类和单词边界识别Unicode。要同时匹配制表符而不是空格,请用[ \t]替换空格。

详细信息

  • \b-单词边界
  • (?!by\b)-下一个单词不能为by
  • [^\W\d_]+-一个或多个字母
  • (?: *(?:, *)?[^\W\d_]+)*-与以下情况的0次或更多次匹配的非捕获组:
    • *-零个或多个空格
    • (?:, *)?-,和0+个空格的可选序列
    • [^\W\d_]+-一个或多个字母。