Question

我有一个单词列表，在单词的开头和结尾都带有标点符号。我需要使用正则表达式分隔标点，如下所示：

sample_input = ["I", "!Go", "I'm", "call.", "exit?!"]

sample_output = ["I", "!", "Go", "I'm", "call", ".", "exit", "?", "!"]

原始字符串如下：

string ="It's a mountainous wonderland decorated with ancient glaciers, breathtaking national parks and sumptuous vineyards, but behind its glossy image New Zealand is failing many of its children."

有人有一个主意，如何解决这个问题？

谢谢。

Answer 1

您可以先通过以下方式标记每个列表项：

import re
words = ["I", "!Go", "I'm", "call.", "exit?!"]
newwords = []
for i in words:
    newwords.append(re.findall(r"[\w']+|[\W]", i))
print newwords

>>>[['I'], ['!', 'Go'], ["I'm"], ['call', '.'], ['exit', '?', '!']]

然后通过以下方式获取结果

：

result= [item for sublist in newwords for item in sublist]
print result

>>>['I', '!', 'Go', "I'm", 'call', '.', 'exit', '?', '!']

您需要使用\w'或使用\W组来破坏每个字符串，以根据所需的输出获得最终列表。您可以根据您的代码要求使用这种方法进行编写。

如何使用python区分单词的开始和结尾标点符号？

1 个答案: