Python在一个句子中清理单词

时间:2013-02-02 15:32:56

标签: python string python-2.7

我正在尝试编写一个接受字符串(句子)的函数,然后清除它并返回所有字母,数字和一个超级。但是代码似乎有误。请知道我在这里做错了什么。

示例:Blake D'souza是一个!d!0t
应该回归:Blake D'Souza是一个d0t

的Python:

def remove_unw2anted(str):
    str = ''.join([c for c in str if c in 'ABCDEFGHIJKLNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890\''])
    return str

def clean_sentence(s):
    lst = [word for word in s.split()]
    #print lst
    for items in lst:
        cleaned = remove_unw2anted(items)
    return cleaned

s = 'Blake D\'souza is an !d!0t'
print clean_sentence(s)

2 个答案:

答案 0 :(得分:5)

你只返回最后一个清理过的字!

应该是:

def clean_sentence(s):
    lst = [word for word in s.split()]

    lst_cleaned = []
    for items in lst:
        lst_cleaned.append(remove_unw2anted(items))
    return ' '.join(lst_cleaned)

更短的方法可能是:

def is_ok(c):
    return c.isalnum() or c in " '"

def clean_sentence(s):
    return filter(is_ok, s)

s = "Blake D'souza is an !d!0t"
print clean_sentence(s)

答案 1 :(得分:1)

使用string.translate的变体,其优点易于扩展,是string的一部分。

import string

allchars = string.maketrans('','')

tokeep = string.letters + string.digits + '-'

toremove = allchars.translate(None, tokeep)

s = "Blake D'souza is an !d!0t"

print s.translate(None, toremove)

输出:

BlakeDsouzaisand0t

OP说只保留字符,数字和连字符 - 也许它们也意味着保留空格?