如何使用python区分单词的开始和结尾标点符号?

时间:2018-10-17 18:00:31

标签: python regex

我有一个单词列表,在单词的开头和结尾都带有标点符号。我需要使用正则表达式分隔标点,如下所示:

sample_input = ["I", "!Go", "I'm", "call.", "exit?!"]

sample_output = ["I", "!", "Go", "I'm", "call", ".", "exit", "?", "!"]

原始字符串如下:

string ="It's a mountainous wonderland decorated with ancient glaciers, breathtaking national parks and sumptuous vineyards, but behind its glossy image New Zealand is failing many of its children."

有人有一个主意,如何解决这个问题?

谢谢。

1 个答案:

答案 0 :(得分:0)

您可以先通过以下方式标记每个列表项:

import re
words = ["I", "!Go", "I'm", "call.", "exit?!"]
newwords = []
for i in words:
    newwords.append(re.findall(r"[\w']+|[\W]", i))
print newwords

>>>[['I'], ['!', 'Go'], ["I'm"], ['call', '.'], ['exit', '?', '!']]

然后通过以下方式获取结果

result= [item for sublist in newwords for item in sublist]
print result

>>>['I', '!', 'Go', "I'm", 'call', '.', 'exit', '?', '!']

您需要使用\w'或使用\W组来破坏每个字符串,以根据所需的输出获得最终列表。 您可以根据您的代码要求使用这种方法进行编写。