我需要将一个字符串拆分为一个单词列表,在空白处分隔,并删除除“
”之外的所有特殊字符例如:
name_2016_04_16
name_2016_04_16
name_2016_04_16
需要变成一个列表
page = "They're going up to the Stark's castle [More:...]"
现在我只能使用
删除所有特殊字符["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
或者只是拆分,使用
保留所有特殊字符re.sub("[^\w]", " ", page).split()
有没有办法指定要删除哪些字符以及要保留哪些字符?
答案 0 :(得分:2)
正常使用str.split
,然后从每个单词中过滤掉不需要的字符:
>>> page = "They're going up to the Stark's castle [More:...]"
>>> result = [''.join(c for c in word if c.isalpha() or c=="'") for word in page.split()]
>>> result
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
答案 1 :(得分:0)
在我看来,使用''.join()
和嵌套列表理解将是一个更简单的选项:
>>> page = "They're going up to the Stark's castle [More:...]"
>>> [''.join([c for c in w if c.isalpha() or c == "'"]) for w in page.split()]
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
>>>
答案 2 :(得分:0)
import re
page = "They're going up to the Stark's castle [More:...]"
s = re.sub("[^\w' ]", "", page).split()
出:
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
首先使用[\w' ]
来匹配您需要的字符,而不是使用^
来匹配相反的字符并替换''
(无)
答案 3 :(得分:0)
这是一个解决方案。
import re
page = "They're going up to the Stark's castle [More:...]"
page = re.sub("[^0-9a-zA-Z']+", ' ', page).rstrip()
print(page)
p=page.split(' ')
print(p)
这是输出。
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']