我正在尝试将字符串分解为单词。
def breakup(text):
temp = []
temp = re.split('\W+', text.rstrip())
return [e.lower() for e in temp]
示例字符串:
什么是黄色,白色,绿色和凹凸不平?穿着燕尾服的泡菜
结果:
['what','s','yellow','white','green','and','bumpy','a','pickle','wearing','a' ,'tuxedo']
但是当我传递像
这样的字符串时锁匠如何像打字机一样?他们都有很多钥匙!
['怎么','是','a','锁匠','喜欢','a','打字机','他们','两个','有','a' ,'很多',''','钥匙','']
我想解析它在列表中没有得到空字符串。
传递的字符串会有标点等等。任何想法。
答案 0 :(得分:5)
如何搜索你想要的东西:
[ s.lower() for s in
re.findall(r'\w+',
"How is a locksmith like a typewritter? They both have a lot of keys!") ]
或者只构建一个列表:
[ s.group().lower() for s in
re.finditer(r'\w+',
"How is a locksmith like a typewritter? They both have a lot of keys!") ]
答案 1 :(得分:4)
只需更改
return [e.lower() for e in temp]
到
return [e.lower() for e in temp if e]
此外,该行
temp = []
不需要,因为您永远不会使用与temp
答案 2 :(得分:2)
这有效:
txt='''\
What's yellow, white, green and bumpy? A pickle wearing a tuxedo
How is a locksmith like a typewritter? They both have a lot of keys!'''
import re
for line in txt.splitlines():
print [word.lower() for word in re.findall(r'\w+', line) if word.strip()]
打印:
['what', 's', 'yellow', 'white', 'green', 'and', 'bumpy', 'a', 'pickle', 'wearing', 'a', 'tuxedo']
['how', 'is', 'a', 'locksmith', 'like', 'a', 'typewritter', 'they', 'both', 'have', 'a', 'lot', 'of', 'keys']
答案 3 :(得分:1)
为什么不在列表理解中检查这个
return [e.lower() for e in temp if len(e) > 0]
或者那里的迂腐
return [e.lower() for e in temp if e]
答案 4 :(得分:1)
'How is a locksmith <blah> a lot of keys!'.rstrip('!?.').split()
答案 5 :(得分:0)
在您的特定情况下,它将是:
def breakup(text):
temp = []
temp = re.split('\W+', text.rstrip())
return [e.lower() for e in temp if e]
更通用的解决方案是:
>>> re.findall('\w+', 'How is a locksmith like a typewritter? They both have a lot of keys!')
>>> ['How', 'is', 'a', 'locksmith', 'like', 'a', 'typewritter', 'They', 'both', 'have', 'a', 'lot', 'of', 'keys']