使用maketrans / translate

时间:2019-03-05 18:51:48

标签: python string pandas series

我想从一系列字符串中删除标点符号。

我正在使用python3.6和maketrans(),translate()函数来执行此操作。但是,它不能给我想要的结果。

代码前有两个句子:

Baking cake of straw-bana-choco will take longer than expcted


Please include as much of the following data that is available.< >< >- Cake Type:< >- Flavors:< >- Decoration Type:< >- Icing:< >-

这是我的代码:

remove_punc = str.maketrans(' ', ' ', string.punctuation)
df.Summary = df.Summary.str.translate(remove_punc)
df.Description = df.Description.str.translate(remove_punc)

代码后的句子:

baking cake of strawbanachoco will take longer than expcted


please include as much of the following data that is available   cake type flavors decoration type icing

所以我想知道为什么strawbanachoco不是staw bana choco,看来代码没有用空格-代替。而在第二种情况下,似乎是用空格代替了标点符号。

我没有包含在上面的代码段中,但是我也将所有句子都小写了。

关于为什么会发生这种情况的任何建议?

谢谢

2 个答案:

答案 0 :(得分:1)

第二句中没有用空格代替。标点符号之间的原始字符串中有空格,这些空格只是被保留。

有关其工作原理的详细信息,请参见https://docs.python.org/3/library/stdtypes.html#str.maketrans

答案 1 :(得分:0)

如果要用空格替换每个标点符号:

s = """
Baking cake of straw-bana-choco will take longer than expcted
Please include as much of the following data that is available.< >< >- Cake Type:< >- Flavors:< >- Decoration Type:< >- Icing:< >-
"""

remove_punc = str.maketrans(dict.fromkeys(string.punctuation, ' '))
print(str.translate(s, remove_punc))

出局:

Baking cake of straw bana choco will take longer than expcted
Please include as much of the following data that is available         Cake Type      Flavors      Decoration Type      Icing     

这里还很好地概述了其他方法: Fast punctuation removal with pandas