Question

我知道这可能是一个非常简单的问题，但我正在努力在python中分割字符串。我的正则表达式有这样的组分隔符：

myRegex = "(\W+)"

我想将这个字符串解析为单词：

testString = "This is my test string, hopefully I can get the word i need"
testAgain = re.split("(\W+)", testString)

结果如下：

['This', ' ', 'is', ' ', 'my', ' ', 'test', ' ', 'string', ', ', 'hopefully', ' ', 'I', ' ', 'can', ' ', 'get', ' ', 'the', ' ', 'word', ' ', 'i', ' ', 'need']

这不是我的预期。我希望列表包含：

['This','is','my','test']......etc

现在我知道它与我的正则表达式中的分组有关，我可以通过删除括号来解决问题。 但是如何保留括号并获得上述结果？

对于这个问题很抱歉，我已经阅读了关于正则表达式与群组分割的官方python文档，但我仍然不明白为什么空白区域在我的列表中

Answer 1

如本回答How to split but ignore separators in quoted strings, in python?所述，您可以在分割后简单地对数组进行切片。这样做很容易，因为你想要所有其他成员，从第一个成员开始（所以1,3,5,7）

您可以使用[start：end：step]表示法，如下所述：

testString = "This is my test string, hopefully I can get the word i need"
testAgain = re.split("(\W+)", testString)
testAgain = testAgain[0::2]

另外，我必须指出\W匹配任何非单词字符，包括标点符号。如果你想保留标点符号，你需要更改你的正则表达式。

Answer 2

你可以简单地做：

testAgain = testString.split()  # built-in split with space

不同的regex方法：

testAgain = re.split(r"\s+", testString)   # split with space
testAgain = re.findall(r"\w+", testString) # find all words
testAgain = re.findall(r"\S+", testString) # find all non space characters

如何在python中的正则表达式分割中忽略该组？

2 个答案: