例如,我有一个任意字符串:
var = 'I have a string I want GE and APPLES but nothing else'
在python中拆分字符串的最佳方法是什么,这样我才能获得'GE'
和'APPLES'
。在Java中,我会拆分空格,然后检查每个数组元素是否有两个或更多连续的字母并抓住那些字母。
有没有更好的方法在Python中做到这一点,我不是特别精通Python的正则表达式?
答案 0 :(得分:3)
使用str.isupper
,str.split
和列表理解:
>>> var = 'I have a string I want GE and APPLES but nothing else'
>>> [x for x in var.split() if x.isupper() and len(x) > 1 ]
['GE', 'APPLES']
使用正则表达式:
>>> import re
>>> re.findall(r'\b[A-Z]{2,}\b', var)
['GE', 'APPLES']
时间比较:
>>> var = 'I have a string I want GE and APPLES but nothing else'*10**5
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 773 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 491 ms per loop
#Input with huge words:
>>> var = ' '.join(['FOO'*1000, 'bar'*1000, 'SPAM'*1000]*1000)
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 224 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 483 ms per loop