Question

我有一个列表，其中列表的每个元素都是章节的标题。每个标题的格式如下：＆＃39; [系列名称] [章节编号]：[章节标题]＆＃39; 因此，我的列表的摘录将是

chapter_title:['One Piece 1 : Romance Dawn', 'One Piece 2 : They Call Him Strawhat Luffy', 'One Piece 3 : Pirate Hunter Zoro Enters']

我想删除章节号和冒号之间的空格。我的工作代码是：

no_space_regex = re.compile(r'\s:')
for i in chapter_title:
    no_space_regex.sub(':',i)

然而，它并没有取代。此外，我知道编译工作，因为如果我使用re.findall它会找到所有空格后面跟冒号。

我有点解决了它，使用：

no_space_regex = re.compile(r'\s:')
def_chapter=[] #list of chapter titles with no space before :
for i in chapter_title:
    i = no_space_regex.sub(':',i)
    def_chapter.append(i)

但我想知道为什么re.sub没有替换它，就像它应该的那样。

Answer 1

re.sub无法改变字符串，因为字符串是不可变的。它所能做的只是返回一个新的字符串。

您的选项是：a）像您一样构建新列表，或者b）如果由于某种原因您确实需要保留chapter_title的身份，则分配到旧列表的完整片段。

>>> import re
>>> 
>>> chapter_title = ['One Piece 1 : Romance Dawn', 'One Piece 2 : They Call Him Strawhat Luffy', 'One Piece 3 : Pirate Hunter Zoro Enters']
>>> no_space_regex = re.compile(r'\s:')
>>> 
>>> id(chapter_title)
139706643715336
>>> chapter_title[:] = (no_space_regex.sub(':', s) for s in chapter_title)
>>> chapter_title
['One Piece 1: Romance Dawn', 'One Piece 2: They Call Him Strawhat Luffy', 'One Piece 3: Pirate Hunter Zoro Enters']
>>> id(chapter_title)
139706643715336

请注意，第二种方法仍会构建新字符串，同时另外变更chapter_title。在几乎所有情况下，我都可以认为你的原始方法会很好，并且重新分配chapter_title的oneliner看起来像这样：

chapter_title = [no_space_regex.sub(':', s) for s in chapter_title]

编辑：在右侧将赋值更改为完整切片到生成器表达式以提高内存效率

无法在Python中使用re.sub替换替换

1 个答案: