我不确定标题是否足够描述,所以基本上我想做的是,说我有这个字符串:
a = "Hello, world! This...is my string."
我想把它分成每个单词的列表,但我希望将标点符号计算为单独的单词。所以,像这样:
["Hello", ",", " ", "world", "!", " "....
等等......
请注意,每个空格也是一个单独的单词
这是我尝试过的代码:
arr = "hello this, is my, st.ring! I, will split it."
final = []
buffer = ''
delims = list(',.! ')
print(delims)
for i in arr.split():
for a in i:
buffer = buffer + a
try:
if a in delims:
final.append(buffer)
buffer = ''
except IndexError:
pass
# if no punctuation in the word
final.append(buffer)
buffer = ''
for i in final:
print(i)
基本上这段代码会迭代字符串并有一个缓冲区,这是以前检查过的字符。当找到delims
中的字符时,在将其内容添加到列表后清除缓冲区
但它不起作用。我不确定为什么,但有没有办法做到这一点?也许内置功能还是什么?