用python正则表达式删除标点符号

时间:2015-06-07 14:01:55

标签: python

import re
def removePunctuation(text):
    """Removes punctuation, changes to lower case, and strips leading and trailing spaces.

    Note:
        Only spaces, letters, and numbers should be retained.  Other characters should ​be
        eliminated (e.g. it's becomes its).  Leading and trailing spaces should be removed after
        punctuation is removed.
​
    Args:
        text (str): A string.
​
    Returns:
        str: The cleaned up string.
    """
    a=0
    while(a==0):
        if(text[0]==' '):
            text=text[1:]
        else:
            a=1
    while(a==1):
        if(text[-1]==' '):
            text=text[0:-1]
        else:
            a=0
    text=re.sub('[A-Z]', '[a-z]', text)
    return re.sub('[^0-9a-zA-Z ]', '', text)
print removePunctuation('Hi, you!')
print removePunctuation(' No under_score!')

结果:

azi you
azo underscore

首先,我删除了字符串开头和结尾的空格。 然后,将字符串设为小写。 最后,删除所有非a-z和数字。

预期结果应为

hi you
no underscore

我不知道为什么会得到" az"在字符串的前面,第一个字符丢失...

1 个答案:

答案 0 :(得分:2)

这是你的问题

text=re.sub('[A-Z]', '[a-z]', text)

将其更改为

text=text.lower()