我想从一系列字符串中删除标点符号。
我正在使用python3.6和maketrans(),translate()函数来执行此操作。但是,它不能给我想要的结果。
代码前有两个句子:
Baking cake of straw-bana-choco will take longer than expcted
Please include as much of the following data that is available.< >< >- Cake Type:< >- Flavors:< >- Decoration Type:< >- Icing:< >-
这是我的代码:
remove_punc = str.maketrans(' ', ' ', string.punctuation)
df.Summary = df.Summary.str.translate(remove_punc)
df.Description = df.Description.str.translate(remove_punc)
代码后的句子:
baking cake of strawbanachoco will take longer than expcted
please include as much of the following data that is available cake type flavors decoration type icing
所以我想知道为什么strawbanachoco
不是staw bana choco
,看来代码没有用空格-
代替。而在第二种情况下,似乎是用空格代替了标点符号。
我没有包含在上面的代码段中,但是我也将所有句子都小写了。
关于为什么会发生这种情况的任何建议?
谢谢
答案 0 :(得分:1)
第二句中没有用空格代替。标点符号之间的原始字符串中有空格,这些空格只是被保留。
有关其工作原理的详细信息,请参见https://docs.python.org/3/library/stdtypes.html#str.maketrans。
答案 1 :(得分:0)
如果要用空格替换每个标点符号:
s = """
Baking cake of straw-bana-choco will take longer than expcted
Please include as much of the following data that is available.< >< >- Cake Type:< >- Flavors:< >- Decoration Type:< >- Icing:< >-
"""
remove_punc = str.maketrans(dict.fromkeys(string.punctuation, ' '))
print(str.translate(s, remove_punc))
出局:
Baking cake of straw bana choco will take longer than expcted
Please include as much of the following data that is available Cake Type Flavors Decoration Type Icing
这里还很好地概述了其他方法: Fast punctuation removal with pandas