import re
def removePunctuation(text):
"""Removes punctuation, changes to lower case, and strips leading and trailing spaces.
Note:
Only spaces, letters, and numbers should be retained. Other characters should be
eliminated (e.g. it's becomes its). Leading and trailing spaces should be removed after
punctuation is removed.
Args:
text (str): A string.
Returns:
str: The cleaned up string.
"""
a=0
while(a==0):
if(text[0]==' '):
text=text[1:]
else:
a=1
while(a==1):
if(text[-1]==' '):
text=text[0:-1]
else:
a=0
text=re.sub('[A-Z]', '[a-z]', text)
return re.sub('[^0-9a-zA-Z ]', '', text)
print removePunctuation('Hi, you!')
print removePunctuation(' No under_score!')
结果:
azi you
azo underscore
首先,我删除了字符串开头和结尾的空格。 然后,将字符串设为小写。 最后,删除所有非a-z和数字。
预期结果应为
hi you
no underscore
我不知道为什么会得到" az"在字符串的前面,第一个字符丢失...
答案 0 :(得分:2)
这是你的问题
text=re.sub('[A-Z]', '[a-z]', text)
将其更改为
text=text.lower()