Question

我正在尝试将字符串分解为单词。

    def breakup(text):
        temp = []
        temp = re.split('\W+', text.rstrip())   
        return [e.lower() for e in temp]

示例字符串：

什么是黄色，白色，绿色和凹凸不平？穿着燕尾服的泡菜

结果：

['what'，'s'，'yellow'，'white'，'green'，'and'，'bumpy'，'a'，'pickle'，'wearing'，'a' ，'tuxedo']

但是当我传递像

这样的字符串时

锁匠如何像打字机一样？他们都有很多钥匙！

['怎么'，'是'，'a'，'锁匠'，'喜欢'，'a'，'打字机'，'他们'，'两个'，'有'，'a' ，'很多'，'''，'钥匙'，'']

我想解析它在列表中没有得到空字符串。

传递的字符串会有标点等等。任何想法。

Answer 1

如何搜索你想要的东西：

[ s.lower() for s in
  re.findall(r'\w+',
    "How is a locksmith like a typewritter? They both have a lot of keys!") ]

或者只构建一个列表：

[ s.group().lower() for s in
  re.finditer(r'\w+',
    "How is a locksmith like a typewritter? They both have a lot of keys!") ]

Answer 2

只需更改

return [e.lower() for e in temp]

到

return [e.lower() for e in temp if e]

此外，该行

temp = []

不需要

，因为您永远不会使用与temp

对齐的空列表

Answer 3

这有效：

txt='''\
What's yellow, white, green and bumpy? A pickle wearing a tuxedo
How is a locksmith like a typewritter? They both have a lot of keys!'''

import re

for line in txt.splitlines():
    print [word.lower() for word in re.findall(r'\w+', line) if word.strip()]

打印：

['what', 's', 'yellow', 'white', 'green', 'and', 'bumpy', 'a', 'pickle', 'wearing', 'a', 'tuxedo']
['how', 'is', 'a', 'locksmith', 'like', 'a', 'typewritter', 'they', 'both', 'have', 'a', 'lot', 'of', 'keys']

Answer 4

为什么不在列表理解中检查这个

return [e.lower() for e in temp if len(e) > 0]

或者那里的迂腐

return [e.lower() for e in temp if e]

Answer 5

你可以这样做：

'How is a locksmith <blah> a lot of keys!'.rstrip('!?.').split()

Answer 6

在您的特定情况下，它将是：

def breakup(text):
    temp = []
    temp = re.split('\W+', text.rstrip())   
    return [e.lower() for e in temp if e]

更通用的解决方案是：

>>> re.findall('\w+', 'How is a locksmith like a typewritter? They both have a lot of keys!') 
>>> ['How', 'is', 'a', 'locksmith', 'like', 'a', 'typewritter', 'They', 'both', 'have', 'a', 'lot', 'of', 'keys']

使用正则表达式将文本解析为结果中带有空字符串的列表

6 个答案: