Question

我需要将一个字符串拆分为一个单词列表，在空白处分隔，并删除除“

”之外的所有特殊字符

例如：

name_2016_04_16  
name_2016_04_16  
name_2016_04_16

需要变成一个列表

page = "They're going up to the Stark's castle [More:...]"

现在我只能使用

删除所有特殊字符

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

或者只是拆分，使用

保留所有特殊字符

re.sub("[^\w]", " ", page).split()

有没有办法指定要删除哪些字符以及要保留哪些字符？

Answer 1

正常使用str.split，然后从每个单词中过滤掉不需要的字符：

>>> page = "They're going up to the Stark's castle [More:...]"
>>> result = [''.join(c for c in word if c.isalpha() or c=="'") for word in page.split()]
>>> result
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

Answer 2

在我看来，使用''.join()和嵌套列表理解将是一个更简单的选项：

>>> page = "They're going up to the Stark's castle [More:...]"
>>> [''.join([c for c in w if c.isalpha() or c == "'"]) for w in page.split()]
["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']
>>>

Answer 3

import re

page = "They're going up to the Stark's castle [More:...]"
s = re.sub("[^\w' ]", "", page).split()

出：

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

首先使用[\w' ]来匹配您需要的字符，而不是使用^来匹配相反的字符并替换''（无）

Answer 4

这是一个解决方案。

替换除字母数字和单引号之外的所有字符带有SPACE的字符并删除任何尾随空格。
现在使用SPACE作为分隔符拆分字符串。

import re

page = "They're going up to the Stark's castle   [More:...]"
page = re.sub("[^0-9a-zA-Z']+", ' ', page).rstrip()
print(page)
p=page.split(' ')
print(p)

这是输出。

["They're", 'going', 'up', 'to', 'the', "Stark's", 'castle', 'More']

Python：将一个字符串拆分成一个列表，取出所有特殊字符，除了'

4 个答案: