寻找一种在所有大写单词上分割字符串的好方法

时间:2013-12-06 22:23:34

标签: python regex string

例如,我有一个任意字符串:

var = 'I have a string I want GE and APPLES but nothing else'

在python中拆分字符串的最佳方法是什么,这样我才能获得'GE''APPLES'。在Java中,我会拆分空格,然后检查每个数组元素是否有两个或更多连续的字母并抓住那些字母。

有没有更好的方法在Python中做到这一点,我不是特别精通Python的正则表达式?

1 个答案:

答案 0 :(得分:3)

使用str.isupperstr.split和列表理解:

>>> var = 'I have a string I want GE and APPLES but nothing else'
>>> [x for x in var.split() if x.isupper() and len(x) > 1 ]
['GE', 'APPLES']

使用正则表达式:

>>> import re
>>> re.findall(r'\b[A-Z]{2,}\b', var)
['GE', 'APPLES']

时间比较:

>>> var = 'I have a string I want GE and APPLES but nothing else'*10**5
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 773 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 491 ms per loop

#Input with huge words:

>>> var = ' '.join(['FOO'*1000, 'bar'*1000, 'SPAM'*1000]*1000)
>>> %timeit [x for x in var.split() if x.isupper() and len(x) > 1 ]
1 loops, best of 3: 224 ms per loop
>>> %timeit re.findall(r'\b[A-Z]{2,}\b', var)
1 loops, best of 3: 483 ms per loop