我需要查明名称是否以列表的任何前缀开头,然后将其删除,例如:
if name[:2] in ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]:
name = name[2:]
以上仅适用于长度为2的列表前缀。我需要可变长度前缀的相同功能。
如何有效地完成(少量代码和良好的性能)?
for循环迭代每个前缀,然后检查name.startswith(prefix)
以根据前缀的长度最终切片名称,但它是很多代码,可能是低效的,并且是“非Pythonic”。 / p>
有人有一个很好的解决方案吗?
答案 0 :(得分:40)
str.startswith(前缀[,start [,end]])¶
如果字符串以前缀开头,则返回True,否则返回 假。前缀也可以是要查找的前缀元组。同 可选的开始,从该位置开始的测试字符串。同 可选结束,停止比较该位置的字符串。
$ ipython
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.4.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: prefixes = ("i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_")
In [2]: 'test'.startswith(prefixes)
Out[2]: False
In [3]: 'i_'.startswith(prefixes)
Out[3]: True
In [4]: 'd_a'.startswith(prefixes)
Out[4]: True
答案 1 :(得分:13)
有点难以阅读,但这有效:
name=name[len(filter(name.startswith,prefixes+[''])[0]):]
答案 2 :(得分:5)
for prefix in prefixes:
if name.startswith(prefix):
name=name[len(prefix):]
break
答案 3 :(得分:2)
如果您将前缀定义为下划线之前的字符,则可以检查
if name.partition("_")[0] in ["i", "c", "m", "l", "d", "t", "e", "b", "foo"] and name.partition("_")[1] == "_":
name = name.partition("_")[2]
答案 4 :(得分:2)
如何使用filter
?
prefs = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
name = list(filter(lambda item: not any(item.startswith(prefix) for prefix in prefs), name))
请注意,每个列表项与前缀的比较会在第一次匹配时有效停止。 any
函数会在找到True
值后立即返回此行为,例如:
def gen():
print("yielding False")
yield False
print("yielding True")
yield True
print("yielding False again")
yield False
>>> any(gen()) # last two lines of gen() are not performed
yielding False
yielding True
True
或者,使用re.match
代替startswith
:
import re
patt = '|'.join(["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"])
name = list(filter(lambda item: not re.match(patt, item), name))
答案 5 :(得分:2)
正则表达式可能会给你最快的速度:
prefixes = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_", "also_longer_"]
re_prefixes = "|".join(re.escape(p) for p in prefixes)
m = re.match(re_prefixes, my_string)
if m:
my_string = my_string[m.end()-m.start():]
答案 6 :(得分:1)
在搜索和效率方面,总是考虑使用索引技术来改进算法。如果您有一长串前缀,则可以使用内存索引,方法是将第一个字符的前缀简单地索引到dict
。
只有当你有一长串前缀并且性能成为一个问题时,这个解决方案才有价值。
pref = ["i_", "c_", "m_", "l_", "d_", "t_", "e_", "b_"]
#indexing prefixes in a dict. Do this only once.
d = dict()
for x in pref:
if not x[0] in d:
d[x[0]] = list()
d[x[0]].append(x)
name = "c_abcdf"
#lookup in d to only check elements with the same first character.
result = filter(lambda x: name.startswith(x),\
[] if name[0] not in d else d[name[0]])
print result
答案 7 :(得分:1)
正则表达式,测试过:
import re
def make_multi_prefix_matcher(prefixes):
regex_text = "|".join(re.escape(p) for p in prefixes)
print repr(regex_text)
return re.compile(regex_text).match
pfxs = "x ya foobar foo a|b z.".split()
names = "xenon yadda yeti food foob foobarre foo a|b a b z.yx zebra".split()
matcher = make_multi_prefix_matcher(pfxs)
for name in names:
m = matcher(name)
if not m:
print repr(name), "no match"
continue
n = m.end()
print repr(name), n, repr(name[n:])
输出:
'x|ya|foobar|foo|a\\|b|z\\.'
'xenon' 1 'enon'
'yadda' 2 'dda'
'yeti' no match
'food' 3 'd'
'foob' 3 'b'
'foobarre' 6 're'
'foo' 3 ''
'a|b' 3 ''
'a' no match
'b' no match
'z.yx' 2 'yx'
'zebra' no match
答案 8 :(得分:0)
这会动态编辑列表,删除前缀。一旦找到特定项目,就会break
跳过其余的前缀。
items = ['this', 'that', 'i_blah', 'joe_cool', 'what_this']
prefixes = ['i_', 'c_', 'a_', 'joe_', 'mark_']
for i,item in enumerate(items):
for p in prefixes:
if item.startswith(p):
items[i] = item[len(p):]
break
print items
['this', 'that', 'blah', 'cool', 'what_this']
答案 9 :(得分:0)
可以使用一个简单的正则表达式。
import re
prefixes = ("i_", "c_", "longer_")
re.sub(r'^(%s)' % '|'.join(prefixes), '', name)
或者如果下划线之前的任何内容是有效的前缀:
name.split('_', 1)[-1]
这将删除第一个下划线之前的任意数量的字符。
答案 10 :(得分:-1)
import re
def make_multi_prefix_replacer(prefixes):
if isinstance(prefixes,str):
prefixes = prefixes.split()
prefixes.sort(key = len, reverse=True)
pat = r'\b(%s)' % "|".join(map(re.escape, prefixes))
print 'regex patern :',repr(pat),'\n'
def suber(x, reg = re.compile(pat)):
return reg.sub('',x)
return suber
pfxs = "x ya foobar yaku foo a|b z."
replacer = make_multi_prefix_replacer(pfxs)
names = "xenon yadda yeti yakute food foob foobarre foo a|b a b z.yx zebra".split()
for name in names:
print repr(name),'\n',repr(replacer(name)),'\n'
ss = 'the yakute xenon is a|bcdf in the barfoobaratu foobarii'
print '\n',repr(ss),'\n',repr(replacer(ss)),'\n'