很容易单独提取它们,
re.findall(r"\((\w+)\)", "It's Jane's cat Jack (male)") #1
re.findall("(?<=\()\w+(?=\))", "It's Jane's cat Jack (male)") #2
# ['male']
re.findall(r"\w+(?='s)", "It's Jane's cat Jack (male)")
# ['It', 'Jane']
re.findall(r"\S+", "It's Jane's cat Jack (male)")
# ["It's", "Jane's", 'cat', 'Jack (male)']
然而,这让我感到困惑
re.findall(r"\((\w+)\)|\w+(?='s)|\S+", "It's Jane's cat Jack (male)") #1
re.findall(r"(?<=\()\w+(?=\))|\w+(?='s)|\S+", "It's Jane's cat Jack (male)") #2
# ['It', "'s", 'Jane', "'s", 'cat', 'Jack', '(male)']
永远不会产生:
# ['It', 'Jane', 'cat', 'Jack', 'male']
顺便说一下,#1还是#2更好?它们产生相同的结果。
感谢观看&amp;回复
答案 0 :(得分:2)
您可以尝试这样做,因为\S+
会匹配一个或多个非空格字符,这也会匹配剩余的's
。而且在比较你给出的两种方法时,你必须使用第二种方法,因为第一种方法应该返回male
字符串和许多空字符串,因为你的正则表达式中存在捕获组。
>>> re.findall(r"(?<=\()\w+(?=\))|\w+(?='s)|(?<!\S)\w+(?!\S)", "It's Jane's cat Jack (male)")
['It', 'Jane', 'cat', 'Jack', 'male']
或
>>> [i for i in re.split(r"\s*(?:[()]|'s|\s)\s*", "It's Jane's cat Jack (male)") if i]
['It', 'Jane', 'cat', 'Jack', 'male']