如何删除字典中的标点符号

时间:2018-10-21 03:48:52

标签: string python-3.x punctuation

我有一本字典,其中的键是字符串,值是字符串列表。我尝试从strings.punctuations模块中使用import strings删除标点符号。

>>> dat = {'2008':['what!','@cool','#fog','@dddong'],'2010':['hey','@cute']}
>>> 

>>> def remove_punct(data):
...     import string
...     punct = string.punctuation
...     rpunct = punct.replace('@',"") # withold @
...     for k,v in data.items():
...         for word in data[k]:
...             word = word.strip(rpunct)
...     return data
... 
>>> remove_punct(dat)
{'2008': ['what!', '@cool', '#fog', '@dddong'], '2010': ['hey', '@cute']}

为什么我不能用#和!删除了吗?

word.strip(rpunct) ...之后,我是否必须再次定义字典?

2 个答案:

答案 0 :(得分:0)

我使用其他正则表达式替换来删除标点符号。

  • \ w将匹配字母数字字符和下划线
  • [^ \ w]将匹配任何非字母数字或下划线的内容

您甚至不需要将其包装在函数中,可以使用以下代码直接更新字典:

import re

for key in dat.keys():
    dat[key] = [re.sub(r'[^\w]', ' ', i) for i in dat[key]]

答案 1 :(得分:0)

您实际上不是在修改data。您需要直接修改data或创建一个新的词典,并用新数据填充它:

>>> dat = {'2008':['what!','@cool','#fog','@dddong'],'2010':['hey','@cute']}
>>> 
>>> def remove_punct(data):
...     import string
...     new_data = {} # the data we will return
...     punct = string.punctuation
...     rpunct = punct.replace('@',"") # withold @
...     for k,v in data.items():
...         new_data[k] = []
...         for word in data[k]:
...             new_data[k].append(word.strip(rpunct))
...     return new_data
... 
>>> remove_punct(dat)
{'2008': ['what', '@cool', 'fog', '@dddong'], '2010': ['hey', '@cute']}

或更少的行:

>>> from string import punctuation
>>> rpunct = punctuation.replace('@',"") # withold @
>>> new_data = {k: [word.strip(rpunct) for word in dat[k]] for k in dat}