Question

def wordlist (l: list) -> list:
    '''Returns a wordlist without white spaces and punctuation'''
    result = []
    table = str.maketrans('!()-[]:;"?.,', '            ')
    for x in l:
        n = x.translate(table)
        n = x.strip()
        n = x.split()
        if n != []:
            result.extend(n)
    return result

该函数应该像这样工作：

print(wordlist(['  Testing', '????', 'function!!']))

应该产生：

['Testing', 'function']

但我上面的代码产生了：

['Testing', '??', 'function!!']

所以我假设我在删除标点符号方面做错了代码 - 我应该在哪里修复它？任何其他简化代码的建议也会受到赞赏（因为我觉得它有点啰嗦）。

Answer 1

您的意思是链接translate(table)，strip()和split()来电吗？

然后

n = x.translate(table)
n = x.strip()
n = x.split()

应该是

n = x.translate(table)
n = n.strip() # change x to n
n = n.split() # same here

或

n = x.translate(table).split()

无需中间strip()。

至于进一步的简化，你不必检查n的空虚，这对我来说似乎是一个不成熟的优化：

if n != []: # you can remove this line
    result.extend(n)

结果：

def wordlist (l: list) -> list:
    '''Returns a wordlist without white spaces and punctuation'''
    result = []
    table = str.maketrans('!()-[]:;"?.,', '            ')
    for x in l:
        result.extend(x.translate(table).split())
    return result

您甚至可以用列表推导替换该循环。

Answer 2

在这里使用re.sub可能更清洁：

import re
clean = re.compile(r'[!()\-\[\]:;"?.,\s]')

words = ['  Testing', '????', 'function!!']
result = list(filter(bool, (clean.sub('', w) for w in words)))
print result
# ['Testing', 'function']

从列表

2 个答案: