给定一个字符串和一个子字符串列表,我想要第一个位置,任何子字符串都出现在字符串中。如果没有出现子字符串,则返回0.我想忽略大小写。
是否有比pythonic更多的东西:
given = 'Iamfoothegreat'
targets = ['foo', 'bar', 'grea', 'other']
res = len(given)
for t in targets:
i = given.lower().find(t)
if i > -1 and i < res:
res = i
if res == len(given):
result = 0
else:
result = res
该代码有效,但似乎效率低下。
答案 0 :(得分:2)
我不会返回0,因为它可能是起始索引,要么使用-1,None或其他一些不可能的值,你可以简单地使用try / except并返回索引:
// Parse text to separate words
String INPUT_TEXT = "Hello World! Hello All! Hi World!";
// Create Multiset
Multiset<String> multiset = LinkedHashMultiset.create(Arrays.asList(INPUT_TEXT.split(" ")));
// Print count words
System.out.println(multiset); // print [Hello x 2, World! x 2, All!, Hi]- in predictable iteration order
// Print all unique words
System.out.println(multiset.elementSet()); // print [Hello, World!, All!, Hi] - in predictable iteration order
// Print count occurrences of words
System.out.println("Hello = " + multiset.count("Hello")); // print 2
System.out.println("World = " + multiset.count("World!")); // print 2
System.out.println("All = " + multiset.count("All!")); // print 1
System.out.println("Hi = " + multiset.count("Hi")); // print 1
System.out.println("Empty = " + multiset.count("Empty")); // print 0
// Print count all words
System.out.println(multiset.size()); //print 6
// Print count unique words
System.out.println(multiset.elementSet().size()); //print 4
如果你想忽略输入字符串的大小写,那么在循环之前设置def get_ind(s, targ):
s = s.lower()
for t in targets:
try:
return s.index(t.lower())
except ValueError:
pass
return None # -1, False ...
。
您还可以执行以下操作:
s = s.lower()
但是,对于每个子字符串而言,最糟糕的是两次查找,而不是使用try / except。它至少也会在第一场比赛中发生短路。
如果你真的想要所有的分钟,那么改为:
def get_ind_next(s, targ):
s = s.lower()
return next((s.index(t) for t in map(str.lower,targ) if t in s), None)
def get_ind(s, targ):
s = s.lower()
mn = float("inf")
for t in targ:
try:
i = s.index(t.lower())
if i < mn:
mn = i
except ValueError:
pass
return mn
def get_ind_next(s, targ):
s = s.lower()
return min((s.index(t) for t in map(str.lower, targ) if t in s), default=None)
仅适用于python&gt; = 3.4所以如果你使用的是python2,那么你将不得不稍微改变逻辑。
Timings python3:
default=None
Python2:
In [29]: s = "hello world" * 5000
In [30]: s += "grea" + s
In [25]: %%timeit
....: targ = [re.escape(x) for x in targets]
....: pattern = r"%(pattern)s" % {'pattern' : "|".join(targ)}
....: firstMatch = next(re.finditer(pattern, s, re.IGNORECASE),None)
....: if firstMatch:
....: pass
....:
100 loops, best of 3: 5.11 ms per loop
In [18]: timeit get_ind_next(s, targets)
1000 loops, best of 3: 691 µs per loop
In [19]: timeit get_ind(s, targets)
1000 loops, best of 3: 627 µs per loop
In [20]: timeit min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0])
1000 loops, best of 3: 1.03 ms per loop
In [21]: s = 'Iamfoothegreat'
In [22]: targets = ['bar', 'grea', 'other','foo']
In [23]: get_ind_next(s, targets) == get_ind(s, targets) == min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0])
Out[24]: True
你也可以将第一个与min结合起来:
In [13]: s = "hello world" * 5000
In [14]: s += "grea" + s
In [15]: targets = ['foo', 'bar', 'grea', 'other']
In [16]: timeit get_ind(s, targets)1000 loops,
best of 3: 322 µs per loop
In [17]: timeit min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0])
1000 loops, best of 3: 710 µs per loop
In [18]: get_ind(s, targets) == min([s.lower().find(x.lower()) for x in targets if x.lower() in s.lower()] or [0])
Out[18]: True
同样的工作,它只是更好一点,也许稍快一点:
def get_ind(s, targ):
s,mn = s.lower(), None
for t in targ:
try:
mn = s.index(t.lower())
yield mn
except ValueError:
pass
yield mn
答案 1 :(得分:2)
Another example just use regex, cause think the python regex implementation is super fast. Not my regex function is
import re
given = 'IamFoothegreat'
targets = ['foo', 'bar', 'grea', 'other']
targets = [re.escape(x) for x in targets]
pattern = r"%(pattern)s" % {'pattern' : "|".join(targets)}
firstMatch = next(re.finditer(pattern, given, re.IGNORECASE),None)
if firstMatch:
print firstMatch.start()
print firstMatch.group()
Output is
3
foo
If nothing is found output is nothing. Should be self explained to check if nothing is found.
Give you the matched string, too
given = 'Iamfoothegreat'.lower()
targets = ['foo', 'bar', 'grea', 'other']
dct = {'pos' : - 1, 'string' : None};
given = given.lower()
for t in targets:
i = given.find(t)
if i > -1 and (i < list['pos'] or list['pos'] == -1):
dct['pos'] = i;
dct['string'] = t;
print dct
Output is:
{'pos': 3, 'string': 'foo'}
If element is not found:
{'pos': -1, 'string': None}
with this string and pattern
given = "hello world" * 5000
given += "grea" + given
targets = ['foo', 'bar', 'grea', 'other']
1000 loops with timeit:
regex approach: 4.08629107475 sec for 1000
normal approach: 1.80048894882 sec for 1000
10 loops. Now with much bigger targets (targets * 1000):
normal approach: 4.06895017624 for 10
regex approach: 34.8153910637 for 10
答案 2 :(得分:1)
您可以使用以下内容:
answer = min([given.lower().find(x.lower()) for x in targets
if x.lower() in given.lower()] or [0])
演示1
given = 'Iamfoothegreat'
targets = ['foo', 'bar', 'grea', 'other']
answer = min([given.lower().find(x.lower()) for x in targets
if x.lower() in given.lower()] or [0])
print(answer)
<强>输出强>
3
演示2
given = 'this is a different string'
targets = ['foo', 'bar', 'grea', 'other']
answer = min([given.lower().find(x.lower()) for x in targets
if x.lower() in given.lower()] or [0])
print(answer)
<强>输出强>
0
我还认为以下解决方案非常易读:
given = 'the string'
targets = ('foo', 'bar', 'grea', 'other')
given = given.lower()
for i in range(len(given)):
if given.startswith(targets, i):
print i
break
else:
print -1
答案 3 :(得分:1)
Your code is fairly good, but you can make it a little more efficient by moving the .lower
conversion out of the loop: there's no need to repeat it for each target substring. The code can be condensed a little using list comprehensions, although that doesn't necessarily make it faster. I use a nested list comp to avoid calling given.find(t)
twice for each t
.
I've wrapped my code in a function for easier testing.
def min_match(given, targets):
given = given.lower()
a = [i for i in [given.find(t) for t in targets] if i > -1]
return min(a) if a else None
targets = ['foo', 'bar', 'grea', 'othe']
data = (
'Iamfoothegreat',
'IAMFOOTHEGREAT',
'Iamfothgrease',
'Iamfothgret',
)
for given in data:
print(given, min_match(given, targets))
output
Iamfoothegreat 3
IAMFOOTHEGREAT 3
Iamfothgrease 7
Iamfothgret None
答案 4 :(得分:0)
试试这个:
def getFirst(given,targets):
try:
return min([i for x in targets for i in [given.find(x)] if not i == -1])
except ValueError:
return 0