从审美角度和绩效角度来看,根据条件将项目列表拆分为多个列表的最佳方法是什么?相当于:
good = [x for x in mylist if x in goodvals]
bad = [x for x in mylist if x not in goodvals]
有更优雅的方法吗?
更新:这是实际的用例,以便更好地解释我正在尝试做的事情:
# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f[2].lower() not in IMAGE_TYPES]
答案 0 :(得分:174)
good, bad = [], []
for x in mylist:
(bad, good)[x in goodvals].append(x)
答案 1 :(得分:98)
这是懒惰的迭代器方法:
from itertools import tee
def split_on_condition(seq, condition):
l1, l2 = tee((condition(item), item) for item in seq)
return (i for p, i in l1 if p), (i for p, i in l2 if not p)
它每个项目评估一次条件并返回两个生成器,首先从条件为真的序列中产生值,另一个产生假值。
因为它很懒,你可以在任何迭代器上使用它,甚至是无限的迭代器:
from itertools import count, islice
def is_prime(n):
return n > 1 and all(n % i for i in xrange(2, n))
primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))
通常,非惰性列表返回方法更好:
def split_on_condition(seq, condition):
a, b = [], []
for item in seq:
(a if condition(item) else b).append(item)
return a, b
编辑:对于您通过某个键将项目拆分到不同列表的更具体的用法,继承了一个通用函数:
DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
"""Split a sequence into lists based on a key function.
seq - input sequence
resultmapping - a dictionary that maps from target lists to keys that go to that list
keyfunc - function to calculate the key of an input value
default - the target where items that don't have a corresponding key go, by default they are dropped
"""
result_lists = dict((key, []) for key in resultmapping)
appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)
if default is not DROP_VALUE:
result_lists.setdefault(default, [])
default_action = result_lists[default].append
else:
default_action = DROP_VALUE
for item in seq:
appenders.get(keyfunc(item), default_action)(item)
return result_lists
用法:
def file_extension(f):
return f[2].lower()
split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']
答案 2 :(得分:93)
good = [x for x in mylist if x in goodvals] bad = [x for x in mylist if x not in goodvals]
有更优雅的方法吗?
该代码完全可读,非常清晰!
# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims = [f for f in files if f[2].lower() not in IMAGE_TYPES]
再次,这是罚款!
使用集合可能会有轻微的性能提升,但这是一个微不足道的差异,我发现列表理解更容易阅读,你不必担心订单混乱,重复删除等等
事实上,我可能会向后退一步,只需使用一个简单的for循环:
images, anims = [], []
for f in files:
if f.lower() in IMAGE_TYPES:
images.append(f)
else:
anims.append(f)
列表理解或使用set()
是正常的,直到你需要添加一些其他检查或其他逻辑 - 比如你要删除所有0字节jpeg,你只需要添加类似的东西...... / p>
if f[1] == 0:
continue
答案 3 :(得分:24)
所有提出的解决方案的问题在于它将扫描并应用过滤功能两次。我会做一个像这样的简单小函数:
def SplitIntoTwoLists(l, f):
a = []
b = []
for i in l:
if f(i):
a.append(i)
else:
b.append(i)
return (a,b)
这样你就不会处理任何两次,也不会重复代码。
答案 4 :(得分:19)
我接受它。我提出了一个懒惰的单遍partition
函数,
它保留了输出子序列中的相对顺序。
我认为要求是:
i
)filter
或groupby
)split
库我的partition
功能(下面介绍)和其他类似的功能
把它变成了一个小型图书馆:
它可以通过PyPI正常安装:
pip install --user split
要根据条件拆分列表,请使用partition
函数:
>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]
partition
功能解释在内部,我们需要同时构建两个子序列,因此非常消耗
只有一个输出序列会强制计算另一个输出序列
太。我们需要在用户请求之间保持状态(存储已处理
但尚未要求的元素)。为了保持状态,我使用两个双端
队列(deques
):
from collections import deque
SplitSeq
上课照顾家务:
class SplitSeq:
def __init__(self, condition, sequence):
self.cond = condition
self.goods = deque([])
self.bads = deque([])
self.seq = iter(sequence)
魔术以其.getNext()
方式发生。它几乎就像.next()
迭代器,但允许指定我们想要的元素类型
这次。在幕后,它不会丢弃被拒绝的元素,
但是把它们放在两个队列中的一个:
def getNext(self, getGood=True):
if getGood:
these, those, cond = self.goods, self.bads, self.cond
else:
these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
if these:
return these.popleft()
else:
while 1: # exit on StopIteration
n = self.seq.next()
if cond(n):
return n
else:
those.append(n)
最终用户应该使用partition
功能。需要一个
条件函数和序列(就像map
或filter
),和
返回两个发电机。第一个生成器构建了一个子序列
条件成立的元素,第二个构建的元素
补充子序列。迭代器和生成器允许惰性
分裂甚至长或无限的序列。
def partition(condition, sequence):
cond = condition if condition else bool # evaluate as bool if condition == None
ss = SplitSeq(cond, sequence)
def goods():
while 1:
yield ss.getNext(getGood=True)
def bads():
while 1:
yield ss.getNext(getGood=False)
return goods(), bads()
我选择测试函数作为第一个方便的参数
将来部分应用(类似于map
和filter
的方式
将测试函数作为第一个参数)。
答案 5 :(得分:13)
先行(OP前编辑):使用套数:
mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]
myset = set(mylist)
goodset = set(goodvals)
print list(myset.intersection(goodset)) # [1, 3, 7]
print list(myset.difference(goodset)) # [2, 4, 5, 6]
这对可读性(恕我直言)和性能都有好处。
第二次(OP后编辑):
创建好的扩展列表:
IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])
这将提高性能。否则,你对我的看法很好。
答案 6 :(得分:13)
我基本上喜欢安德斯的方法,因为它很一般。这是一个将分类程序放在第一位(匹配过滤器语法)并使用defaultdict(假定已导入)的版本。
def categorize(func, seq):
"""Return mapping from categories to lists
of categorized items.
"""
d = defaultdict(list)
for item in seq:
d[func(item)].append(item)
return d
答案 7 :(得分:9)
itertools.groupby几乎可以执行您想要的操作,但它需要对项目进行排序以确保您获得单个连续范围,因此您需要先按键排序(否则您将获得多个交错组)对于每种类型)。例如
def is_good(f):
return f[2].lower() in IMAGE_TYPES
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]
for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
print key, list(group)
给出:
False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]
与其他解决方案类似,可以将关键功能定义为分成任意数量的组。
答案 8 :(得分:5)
就个人而言,我喜欢你引用的版本,假设你已经有一个goodvals
的列表。如果没有,比如:
good = filter(lambda x: is_good(x), mylist)
bad = filter(lambda x: not is_good(x), mylist)
当然,这与使用像你最初的列表理解非常相似,但是使用函数而不是查找:
good = [x for x in mylist if is_good(x)]
bad = [x for x in mylist if not is_good(x)]
总的来说,我发现列表理解的美学非常令人愉悦。当然,如果您实际上不需要保留排序并且不需要重复,那么在集合上使用intersection
和difference
方法也会很有效。
答案 9 :(得分:4)
如果你想用FP风格制作:
good, bad = [ sum(x, []) for x in zip(*(([y], []) if y in goodvals else ([], [y])
for y in mylist)) ]
不是最易读的解决方案,但至少只通过mylist迭代一次。
答案 10 :(得分:3)
def partition(pred, iterable):
'Use a predicate to partition entries into false entries and true entries'
# partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9
t1, t2 = tee(iterable)
return filterfalse(pred, t1), filter(pred, t2)
检查this
答案 11 :(得分:3)
good.append(x) if x in goodvals else bad.append(x)
@dansalmo给出的这个简洁明了的答案掩盖在评论中,所以我只是在这里重新张贴它作为答案,这样它才能得到应有的重视,尤其是对于新读者而言。
完整示例:
good, bad = [], []
for x in my_list:
good.append(x) if x in goodvals else bad.append(x)
答案 12 :(得分:3)
有时,列表理解看起来不是最好用的!
我根据人们对此主题的回答进行了一点测试,并在随机生成的列表中进行了测试。这是列表的生成(可能有更好的方法,但不是重点):
good_list = ('.jpg','.jpeg','.gif','.bmp','.png')
import random
import string
my_origin_list = []
for i in xrange(10000):
fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(good_list)
else:
fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
我们走了
# Parand
def f1():
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def f2():
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def f3():
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# Ants Aasma
def f4():
l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def f5():
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def f6():
return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)
使用cmpthese函数,最好的结果是dbr答案:
f1 204/s -- -5% -14% -15% -20% -26%
f6 215/s 6% -- -9% -11% -16% -22%
f3 237/s 16% 10% -- -2% -7% -14%
f4 240/s 18% 12% 2% -- -6% -13%
f5 255/s 25% 18% 8% 6% -- -8%
f2 277/s 36% 29% 17% 15% 9% --
答案 13 :(得分:2)
这个问题的又一个解决方案。我需要一个尽可能快的解决方案。这意味着列表上只有一次迭代,最好是O(1),用于将数据添加到结果列表之一。这与 sastanin 提供的解决方案非常相似,只是更短:
from collections import deque
def split(iterable, function):
dq_true = deque()
dq_false = deque()
# deque - the fastest way to consume an iterator and append items
deque((
(dq_true if function(item) else dq_false).append(item) for item in iterable
), maxlen=0)
return dq_true, dq_false
然后,您可以按以下方式使用该功能:
lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)
selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})
如果您对生成的deque
对象不满意,您可以轻松将其转换为list
,set
,无论您喜欢什么(例如list(lower)
)。转换速度要快得多,直接构建列表。
此方法保持项目的顺序以及任何重复项。
答案 14 :(得分:1)
有时你不需要列表的另一半。 例如:
import sys
from itertools import ifilter
trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]
myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)
print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')
答案 15 :(得分:1)
要获得性能,请尝试itertools
。
itertools模块标准化了一组核心快速,内存有效的工具,这些工具本身或组合使用。它们共同组成了一个“迭代器代数”,可以在纯Python中简洁有效地构建专用工具。
请参阅itertools.ifilter或imap。
itertools.ifilter(谓词,可迭代)
创建一个迭代器,用于过滤来自iterable的元素,仅返回谓词为True的元素
答案 16 :(得分:1)
受@ gnibbler great (but terse!) answer的启发,我们可以应用该方法映射到多个分区:
from collections import defaultdict
def splitter(l, mapper):
"""Split an iterable into multiple partitions generated by a callable mapper."""
results = defaultdict(list)
for x in l:
results[mapper(x)] += [x]
return results
然后可以按如下方式使用splitter
:
>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0) # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]
这适用于两个以上具有更复杂映射的分区(以及迭代器):
>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
(1, [2]),
(2, [3]),
(3, [4, 5, 6]),
(4, [7, 8, 9]),
(5, [10, 11, 12, 13, 14, 15]),
(6, [16, 17, 18, 19, 20, 21, 22])]
或使用字典进行映射:
>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]
答案 17 :(得分:1)
bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]
append返回None,因此可以正常工作。
答案 18 :(得分:0)
这是最快的方法。
它使用if else
(类似于dbr的答案),但首先创建一个集合。一组将操作次数从O(m * n)减少到O(log m)+ O(n),从而使速度提高了45%以上。
good_list_set = set(good_list) # 45% faster than a tuple.
good, bad = [], []
for item in my_origin_list:
if item in good_list_set:
good.append(item)
else:
bad.append(item)
短一点:
good_list_set = set(good_list) # 45% faster than a tuple.
good, bad = [], []
for item in my_origin_list:
out = good if item in good_list_set else bad
out.append(item)
基准测试结果
filter_BJHomer 80/s -- -3265% -5312% -5900% -6262% -7273% -7363% -8051% -8162% -8244%
zip_Funky 118/s 4848% -- -3040% -3913% -4450% -5951% -6085% -7106% -7271% -7393%
two_lst_tuple_JohnLaRoy 170/s 11332% 4367% -- -1254% -2026% -4182% -4375% -5842% -6079% -6254%
if_else_DBR 195/s 14392% 6428% 1434% -- -882% -3348% -3568% -5246% -5516% -5717%
two_lst_compr_Parand 213/s 16750% 8016% 2540% 967% -- -2705% -2946% -4786% -5083% -5303%
if_else_1_line_DanSalmo 292/s 26668% 14696% 7189% 5033% 3707% -- -331% -2853% -3260% -3562%
tuple_if_else 302/s 27923% 15542% 7778% 5548% 4177% 343% -- -2609% -3029% -3341%
set_1_line 409/s 41308% 24556% 14053% 11035% 9181% 3993% 3529% -- -569% -991%
set_shorter 434/s 44401% 26640% 15503% 12303% 10337% 4836% 4345% 603% -- -448%
set_if_else 454/s 46952% 28358% 16699% 13349% 11290% 5532% 5018% 1100% 469% --
Python 3.7的完整基准代码(从FunkySayu修改):
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def two_lst_compr_Parand(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def if_else_DBR(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def zip_Funky(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def filter_BJHomer(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def set_1_line(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
# ChaimG set and if else list.
def set_shorter(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
out = good if e[2] in good_list_set else bad
out.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def set_if_else(*_):
good_list_set = set(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_set:
good.append(e)
else:
bad.append(e)
return good, bad
# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
good_list_tuple = tuple(good_list)
good, bad = [], []
for e in my_origin_list:
if e[2] in good_list_tuple:
good.append(e)
else:
bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append("--")
else:
diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
diffs = " ".join(f'{x:>8s}' for x in diff)
print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))
答案 19 :(得分:0)
const onClick = alert("hello");
console.log(onClick);
条件较长时很好,例如您的示例。读者不必弄清楚负面条件以及它是否能捕获所有其他情况。
答案 20 :(得分:0)
还有一个答案,简短但“邪恶”(用于列表理解副作用)。
digits = list(range(10))
odd = [x.pop(i) for i, x in enumerate(digits) if x % 2]
>>> odd
[1, 3, 5, 7, 9]
>>> digits
[0, 2, 4, 6, 8]
答案 21 :(得分:0)
使用布尔逻辑将数据分配给两个数组
>>> images, anims = [[i for i in files if t ^ (i[2].lower() in IMAGE_TYPES) ] for t in (0, 1)]
>>> images
[('file1.jpg', 33, '.jpg')]
>>> anims
[('file2.avi', 999, '.avi')]
答案 22 :(得分:0)
以前的答案似乎无法满足我所有的四个强迫性打tick:
我的解决方案不是很好,我认为我不建议您使用它,但是这里是:
def iter_split_on_predicate(predicate: Callable[[T], bool], iterable: Iterable[T]) -> Tuple[Iterator[T], Iterator[T]]:
deque_predicate_true = deque()
deque_predicate_false = deque()
# define a generator function to consume the input iterable
# the Predicate is evaluated once per item, added to the appropriate deque, and the predicate result it yielded
def shared_generator(definitely_an_iterator):
for item in definitely_an_iterator:
print("Evaluate predicate.")
if predicate(item):
deque_predicate_true.appendleft(item)
yield True
else:
deque_predicate_false.appendleft(item)
yield False
# consume input iterable only once,
# converting to an iterator with the iter() function if necessary. Probably this conversion is unnecessary
shared_gen = shared_generator(
iterable if isinstance(iterable, collections.abc.Iterator) else iter(iterable)
)
# define a generator function for each predicate outcome and queue
def iter_for(predicate_value, hold_queue):
def consume_shared_generator_until_hold_queue_contains_something():
if not hold_queue:
try:
while next(shared_gen) != predicate_value:
pass
except:
pass
consume_shared_generator_until_hold_queue_contains_something()
while hold_queue:
print("Yield where predicate is "+str(predicate_value))
yield hold_queue.pop()
consume_shared_generator_until_hold_queue_contains_something()
# return a tuple of two generators
return iter_for(predicate_value=True, hold_queue=deque_predicate_true), iter_for(predicate_value=False, hold_queue=deque_predicate_false)
通过以下测试,我们从打印语句中获得以下输出:
t,f = iter_split_on_predicate(lambda item:item>=10,[1,2,3,10,11,12,4,5,6,13,14,15])
print(list(zip(t,f)))
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# [(10, 1), (11, 2), (12, 3), (13, 4), (14, 5), (15, 6)]
答案 23 :(得分:0)
基于生成器的版本,如果您可以忍受原始列表的一次或两次颠倒。
设置...
random.seed(1234)
a = list(range(10))
random.shuffle(a)
a
[2, 8, 3, 5, 6, 4, 9, 0, 1, 7]
还有分裂...
(list((a.pop(j) for j, y in [(len(a)-i-1, x) for i,x in enumerate(a[::-1])] if y%2 == 0))[::-1], a)
([2, 8, 6, 4, 0], [3, 5, 9, 1, 7])
答案 24 :(得分:0)
我求助于 numpy
来解决这个问题,以限制行数并使其成为一个简单的函数。
我能够满足条件,将列表分成两部分,使用 np.where
分离出一个列表。这适用于数字,但我相信可以使用字符串和列表进行扩展。
这是……
from numpy import where as wh, array as arr
midz = lambda a, mid: (a[wh(a > mid)], a[wh((a =< mid))])
p_ = arr([i for i in [75, 50, 403, 453, 0, 25, 428] if i])
high,low = midz(p_, p_.mean())
答案 25 :(得分:0)
这个列表理解简单易读。正是 OP 所要求的。
set_good_vals = set(good_vals) # Speed boost.
good = [x for x in my_list if x in set_good_vals]
bad = [x for x in my_list if x not in set_good_vals]
我更喜欢单个列表理解而不是两个,但与发布的许多答案(其中一些非常巧妙)不同,它可读且清晰。它也是页面上最快的答案之一。
唯一[稍微]快的答案是:
set_good_vals = set(good_vals)
good, bad = [], []
for item in my_list:
_ = good.append(item) if item in set_good_vals else bad.append(item)
...以及它的变体。 (见我的另一个答案)。但我发现第一种方式更优雅,而且速度几乎一样快。
答案 26 :(得分:0)
如果列表由组和间歇性分隔符组成,则可以使用:
def split(items, p):
groups = [[]]
for i in items:
if p(i):
groups.append([])
groups[-1].append(i)
return groups
用法:
split(range(1,11), lambda x: x % 3 == 0)
# gives [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
答案 27 :(得分:0)
例如,按偶数和奇数分割列表
arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))
或一般而言:
def split(predicate, iterable):
return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))
优势:
缺点
答案 28 :(得分:0)
不确定这是否是一种好方法,但也可以通过这种方式完成
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))
答案 29 :(得分:0)
如果您不介意使用外部库,我知道其中有两个本能地执行此操作:
>>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
iteration_utilities.partition
:
>>> from iteration_utilities import partition
>>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES)
>>> notimages
[('file2.avi', 999, '.avi')]
>>> images
[('file1.jpg', 33, '.jpg')]
>>> from more_itertools import partition
>>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files)
>>> list(notimages) # returns a generator so you need to explicitly convert to list.
[('file2.avi', 999, '.avi')]
>>> list(images)
[('file1.jpg', 33, '.jpg')]
答案 30 :(得分:0)
[even, odd] = separate(
lambda x: bool(x % 2),
[1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])
{{1}}
答案 31 :(得分:0)
我采用2遍方法,将谓词的评估与过滤列表分开:
def partition(pred, iterable):
xs = list(zip(map(pred, iterable), iterable))
return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]
有什么好处,性能方面(除了在pred
的每个成员上只评估iterable
一次)之外,是它将大量逻辑移出解释器并进入高度 - 优化的迭代和映射代码。这可以加速迭代长迭代,如in this answer所述。
表达方面,它利用了理解和映射这样的表达习语。
答案 32 :(得分:0)
这里已经有很多解决方案,但另一种方法是 -
anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
只在列表上迭代一次,看起来更加pythonic,因此对我来说是可读的。
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>
答案 33 :(得分:0)
如果您只关心上述某些方法(甚至是您自己的方法)在单个函数中使用语义只需要两行代码,那么:
def part_with_predicate(l, pred):
return [i for i in l if pred(i)], [i for i in l if not pred(i)]
它不是一种惰性eval方法,它在列表中迭代两次,但它允许您在一行代码中对列表进行分区。
答案 34 :(得分:0)
如果你坚持聪明,你可以采取Winden的解决方案,只是有点虚假的聪明:
def splay(l, f, d=None):
d = d or {}
for x in l: d.setdefault(f(x), []).append(x)
return d
答案 35 :(得分:-1)
您可以使用Python进行惰性函数编程,如下所示:
partition = lambda l, c: map(
lambda iii: (i for ii in iii for i in ii),
zip(*(([], [e]) if c(e) else ([e], []) for e in l)))
函数式编程很优雅,但Python却不然。如果您知道列表中没有None
值,也请参见以下示例:
partition = lambda l, c: map(
filter(lambda x: x is not None, l),
zip(*((None, e) if c(e) else (e, None) for e in l)))
答案 36 :(得分:-1)
def partition(pred, seq):
return reduce( lambda (yes, no), x: (yes+[x], no) if pred(x) else (yes, no+[x]), seq, ([], []) )
答案 37 :(得分:-2)
我最喜欢的食谱是:
goodvals = set(goodvals) # Turbocharges the performance by 55%!
good, bad = [], []
_ = [good.append(x) if x in goodvals else bad.append(x) for x in mylist]
简单,快速,可读; Python本来就是这样的。
goodvals
变为set
(使用哈希表)而不是
tuple
,我们获得超快速查找。 mylist
中的每个项目
一旦。这有助于加快速度。_ =
是一种Pythonic方式,用于声明我们有意丢弃列表理解的结果。这不是一个错误。(基于dansalmo对this回答的评论,因为它似乎应该是它自己的答案。)
编辑:
在我的基准测试中将goodvals
转换为设定的涡轮增压性能55%。使用tuple
是O(n * m),而将其转换为set
是O(log n + m)。
此外,goodvals
(即n)只有五个项目。 mylist
,(即m),可以有数百个项目。此外,创建一个集合可能在C语言代码的基础上进行了高度优化。
这是我使用的基准代码。它基于从this回答中获取的代码,并已修改为与在Windows 7上运行的Python v3.7.0一起使用。
good_list = ['.jpg','.jpeg','.gif','.bmp','.png']
import random
import string
my_origin_list = []
for i in range(10000):
fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
if random.getrandbits(1):
fext = random.choice(list(good_list))
else:
fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))
my_origin_list.append((fname + fext, random.randrange(1000), fext))
# Parand
def f1(*_):
return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]
# dbr
def f2(*_):
a, b = list(), list()
for e in my_origin_list:
if e[2] in good_list:
a.append(e)
else:
b.append(e)
return a, b
# John La Rooy
def f3(*_):
a, b = list(), list()
for e in my_origin_list:
(b, a)[e[2] in good_list].append(e)
return a, b
# # Ants Aasma
# def f4():
# l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
# return [i for p, i in l1 if p], [i for p, i in l2 if not p]
# My personal way to do
def f5(*_):
a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
return list(filter(None, a)), list(filter(None, b))
# BJ Homer
def f6(*_):
return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list, my_origin_list))
# ChaimG's answer; as a list.
def f7(*_):
good, bad = [], []
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list else bad.append(e)
return good, bad
# ChaimG's answer; as a set.
def f8(*_):
good, bad = [], []
good_list_set = set(good_list)
for e in my_origin_list:
_ = good.append(e) if e[2] in good_list_set else bad.append(e)
return good, bad
def cmpthese(n=0, functions=None):
results = {}
for func_name in functions:
args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
t = Timer(*args)
results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec
functions_sorted = sorted(functions, key=results.__getitem__)
for f in functions_sorted:
diff = []
for func in functions_sorted:
if func == f:
diff.append(" --")
else:
diff.append("%5.0f%%" % (results[f]/results[func]*100 - 100))
diffs = " ".join(diff)
print("%s\t%6d/s %s" % (f, results[f], diffs))
if __name__=='__main__':
from timeit import Timer
cmpthese(1000, 'f1 f2 f3 f5 f6 f7 f8'.split(" "))