Question

更新：到目前为止，https://regex101.com/r/bE5aWW/1/是我可以提出的，但是需要摆脱的帮助。

案例1

\n                                \n                                   by name name\n                                \n

案例2

\n                                \n                                   name name\n                                \n

案例3

by name name

案例4

name name

我想从上述字符串中选择名称部分，即name name。我想到的一个是(?:by)? ([\w ]+)前没有空格时，by不能工作。

谢谢

regex101中的代码

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"(?:by)? ([\w ]+)"

test_str = ("\\n                                \\n                                   by Ally Foster\\n                                \\n                            \n\n"
    "\\n                                \\n                                   Ally Foster\\n                                \\n                            \n\n"
    "by name name\n\n"
    "name name")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches):
    matchNum = matchNum + 1

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

Answer 1

(?:by )?(\b(?!by\b)[\w, ]+\S)

我的最终版本也不会选择仅包含by的字符串

Answer 2

我建议使用

re.findall(r'\b(?!by\b)[^\W\d_]+(?: *(?:, *)?[^\W\d_]+)*', s)

请参见regex demo。在Python 2中，您将需要传递re.U标志以使所有速记字符类和单词边界识别Unicode。要同时匹配制表符而不是空格，请用[ \t]替换空格。

详细信息

\b-单词边界
(?!by\b)-下一个单词不能为by
[^\W\d_]+-一个或多个字母
(?: *(?:, *)?[^\W\d_]+)*-与以下情况的0次或更多次匹配的非捕获组：
- *-零个或多个空格
- (?:, *)?-,和0+个空格的可选序列
- [^\W\d_]+-一个或多个字母。

正则表达式从字符串中删除“ by”

2 个答案: