Python:根据条件拆分列表?

时间:2009-06-04 07:37:18

标签: python

从审美角度和绩效角度来看,根据条件将项目列表拆分为多个列表的最佳方法是什么?相当于:

good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]

有更优雅的方法吗?

更新:这是实际的用例,以便更好地解释我正在尝试做的事情:

# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]

38 个答案:

答案 0 :(得分:174)

good, bad = [], []
for x in mylist:
    (bad, good)[x in goodvals].append(x)

答案 1 :(得分:98)

这是懒惰的迭代器方法:

from itertools import tee

def split_on_condition(seq, condition):
    l1, l2 = tee((condition(item), item) for item in seq)
    return (i for p, i in l1 if p), (i for p, i in l2 if not p)

它每个项目评估一次条件并返回两个生成器,首先从条件为真的序列中产生值,另一个产生假值。

因为它很懒,你可以在任何迭代器上使用它,甚至是无限的迭代器:

from itertools import count, islice

def is_prime(n):
    return n > 1 and all(n % i for i in xrange(2, n))

primes, not_primes = split_on_condition(count(), is_prime)
print("First 10 primes", list(islice(primes, 10)))
print("First 10 non-primes", list(islice(not_primes, 10)))

通常,非惰性列表返回方法更好:

def split_on_condition(seq, condition):
    a, b = [], []
    for item in seq:
        (a if condition(item) else b).append(item)
    return a, b

编辑:对于您通过某个键将项目拆分到不同列表的更具体的用法,继承了一个通用函数:

DROP_VALUE = lambda _:_
def split_by_key(seq, resultmapping, keyfunc, default=DROP_VALUE):
    """Split a sequence into lists based on a key function.

        seq - input sequence
        resultmapping - a dictionary that maps from target lists to keys that go to that list
        keyfunc - function to calculate the key of an input value
        default - the target where items that don't have a corresponding key go, by default they are dropped
    """
    result_lists = dict((key, []) for key in resultmapping)
    appenders = dict((key, result_lists[target].append) for target, keys in resultmapping.items() for key in keys)

    if default is not DROP_VALUE:
        result_lists.setdefault(default, [])
        default_action = result_lists[default].append
    else:
        default_action = DROP_VALUE

    for item in seq:
        appenders.get(keyfunc(item), default_action)(item)

    return result_lists

用法:

def file_extension(f):
    return f[2].lower()

split_files = split_by_key(files, {'images': IMAGE_TYPES}, keyfunc=file_extension, default='anims')
print split_files['images']
print split_files['anims']

答案 2 :(得分:93)

good = [x for x in mylist if x in goodvals]
bad  = [x for x in mylist if x not in goodvals]
     

有更优雅的方法吗?

该代码完全可读,非常清晰!

# files looks like: [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ... ]
IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
images = [f for f in files if f[2].lower() in IMAGE_TYPES]
anims  = [f for f in files if f[2].lower() not in IMAGE_TYPES]

再次,这是罚款!

使用集合可能会有轻微的性能提升,但这是一个微不足道的差异,我发现列表理解更容易阅读,你不必担心订单混乱,重复删除等等

事实上,我可能会向后退一步,只需使用一个简单的for循环:

images, anims = [], []

for f in files:
    if f.lower() in IMAGE_TYPES:
        images.append(f)
    else:
        anims.append(f)

列表理解或使用set()是正常的,直到你需要添加一些其他检查或其他逻辑 - 比如你要删除所有0字节jpeg,你只需要添加类似的东西...... / p>

if f[1] == 0:
    continue

答案 3 :(得分:24)

所有提出的解决方案的问题在于它将扫描并应用过滤功能两次。我会做一个像这样的简单小函数:

def SplitIntoTwoLists(l, f):
  a = []
  b = []
  for i in l:
    if f(i):
      a.append(i)
    else:
      b.append(i)
 return (a,b)

这样你就不会处理任何两次,也不会重复代码。

答案 4 :(得分:19)

我接受它。我提出了一个懒惰的单遍partition函数, 它保留了输出子序列中的相对顺序。

1。要求

我认为要求是:

  • 维护元素的相对顺序(因此,没有集合和 字典)
  • 仅为每个元素评估条件一次(因此不使用 (ifiltergroupby
  • 允许延迟消耗任一序列(如果我们能够承受 预先计算它们,然后天真的实现可能是 也可以接受)

2。 split

我的partition功能(下面介绍)和其他类似的功能 把它变成了一个小型图书馆:

它可以通过PyPI正常安装:

pip install --user split

要根据条件拆分列表,请使用partition函数:

>>> from split import partition
>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi') ]
>>> image_types = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> images, other = partition(lambda f: f[-1] in image_types, files)
>>> list(images)
[('file1.jpg', 33L, '.jpg')]
>>> list(other)
[('file2.avi', 999L, '.avi')]

3。 partition功能解释

在内部,我们需要同时构建两个子序列,因此非常消耗 只有一个输出序列会强制计算另一个输出序列 太。我们需要在用户请求之间保持状态(存储已处理 但尚未要求的元素)。为了保持状态,我使用两个双端 队列(deques):

from collections import deque

SplitSeq上课照顾家务:

class SplitSeq:
    def __init__(self, condition, sequence):
        self.cond = condition
        self.goods = deque([])
        self.bads = deque([])
        self.seq = iter(sequence)

魔术以其.getNext()方式发生。它几乎就像.next() 迭代器,但允许指定我们想要的元素类型 这次。在幕后,它不会丢弃被拒绝的元素, 但是把它们放在两个队列中的一个:

    def getNext(self, getGood=True):
        if getGood:
            these, those, cond = self.goods, self.bads, self.cond
        else:
            these, those, cond = self.bads, self.goods, lambda x: not self.cond(x)
        if these:
            return these.popleft()
        else:
            while 1: # exit on StopIteration
                n = self.seq.next()
                if cond(n):
                    return n
                else:
                    those.append(n)

最终用户应该使用partition功能。需要一个 条件函数和序列(就像mapfilter),和 返回两个发电机。第一个生成器构建了一个子序列 条件成立的元素,第二个构建的元素 补充子序列。迭代器和生成器允许惰性 分裂甚至长或无限的序列。

def partition(condition, sequence):
    cond = condition if condition else bool  # evaluate as bool if condition == None
    ss = SplitSeq(cond, sequence)
    def goods():
        while 1:
            yield ss.getNext(getGood=True)
    def bads():
        while 1:
            yield ss.getNext(getGood=False)
    return goods(), bads()

我选择测试函数作为第一个方便的参数 将来部分应用(类似于mapfilter的方式 将测试函数作为第一个参数)。

答案 5 :(得分:13)

先行(OP前编辑):使用套数:

mylist = [1,2,3,4,5,6,7]
goodvals = [1,3,7,8,9]

myset = set(mylist)
goodset = set(goodvals)

print list(myset.intersection(goodset))  # [1, 3, 7]
print list(myset.difference(goodset))    # [2, 4, 5, 6]

这对可读性(恕我直言)和性能都有好处。

第二次(OP后编辑):

创建好的扩展列表:

IMAGE_TYPES = set(['.jpg','.jpeg','.gif','.bmp','.png'])

这将提高性能。否则,你对我的看法很好。

答案 6 :(得分:13)

我基本上喜欢安德斯的方法,因为它很一般。这是一个将分类程序放在第一位(匹配过滤器语法)并使用defaultdict(假定已导入)的版本。

def categorize(func, seq):
    """Return mapping from categories to lists
    of categorized items.
    """
    d = defaultdict(list)
    for item in seq:
        d[func(item)].append(item)
    return d

答案 7 :(得分:9)

itertools.groupby几乎可以执行您想要的操作,但它需要对项目进行排序以确保您获得单个连续范围,因此您需要先按键排序(否则您将获得多个交错组)对于每种类型)。例如

def is_good(f):
    return f[2].lower() in IMAGE_TYPES

files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file3.gif', 123L, '.gif')]

for key, group in itertools.groupby(sorted(files, key=is_good), key=is_good):
    print key, list(group)

给出:

False [('file2.avi', 999L, '.avi')]
True [('file1.jpg', 33L, '.jpg'), ('file3.gif', 123L, '.gif')]

与其他解决方案类似,可以将关键功能定义为分成任意数量的组。

答案 8 :(得分:5)

就个人而言,我喜欢你引用的版本,假设你已经有一个goodvals的列表。如果没有,比如:

good = filter(lambda x: is_good(x), mylist)
bad = filter(lambda x: not is_good(x), mylist)

当然,这与使用像你最初的列表理解非常相似,但是使用函数而不是查找:

good = [x for x in mylist if is_good(x)]
bad  = [x for x in mylist if not is_good(x)]

总的来说,我发现列表理解的美学非常令人愉悦。当然,如果您实际上不需要保留排序并且不需要重复,那么在集合上使用intersectiondifference方法也会很有效。

答案 9 :(得分:4)

如果你想用FP风格制作:

good, bad = [ sum(x, []) for x in zip(*(([y], []) if y in goodvals else ([], [y])
                                        for y in mylist)) ]

不是最易读的解决方案,但至少只通过mylist迭代一次。

答案 10 :(得分:3)

def partition(pred, iterable):
    'Use a predicate to partition entries into false entries and true entries'
    # partition(is_odd, range(10)) --> 0 2 4 6 8   and  1 3 5 7 9
    t1, t2 = tee(iterable)
    return filterfalse(pred, t1), filter(pred, t2)

检查this

答案 11 :(得分:3)

good.append(x) if x in goodvals else bad.append(x)

@dansalmo给出的这个简洁明了的答案掩盖在评论中,所以我只是在这里重新张贴它作为答案,这样它才能得到应有的重视,尤其是对于新读者而言。

完整示例:

good, bad = [], []
for x in my_list:
    good.append(x) if x in goodvals else bad.append(x)

答案 12 :(得分:3)

有时,列表理解看起来不是最好用的!

我根据人们对此主题的回答进行了一点测试,并在随机生成的列表中进行了测试。这是列表的生成(可能有更好的方法,但不是重点):

good_list = ('.jpg','.jpeg','.gif','.bmp','.png')

import random
import string
my_origin_list = []
for i in xrange(10000):
    fname = ''.join(random.choice(string.lowercase) for i in range(random.randrange(10)))
    if random.getrandbits(1):
        fext = random.choice(good_list)
    else:
        fext = "." + ''.join(random.choice(string.lowercase) for i in range(3))

    my_origin_list.append((fname + fext, random.randrange(1000), fext))

我们走了

# Parand
def f1():
    return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]

# dbr
def f2():
    a, b = list(), list()
    for e in my_origin_list:
        if e[2] in good_list:
            a.append(e)
        else:
            b.append(e)
    return a, b

# John La Rooy
def f3():
    a, b = list(), list()
    for e in my_origin_list:
        (b, a)[e[2] in good_list].append(e)
    return a, b

# Ants Aasma
def f4():
    l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
    return [i for p, i in l1 if p], [i for p, i in l2 if not p]

# My personal way to do
def f5():
    a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
    return list(filter(None, a)), list(filter(None, b))

# BJ Homer
def f6():
    return filter(lambda e: e[2] in good_list, my_origin_list), filter(lambda e: not e[2] in good_list, my_origin_list)

使用cmpthese函数,最好的结果是dbr答案:

f1     204/s  --    -5%   -14%   -15%   -20%   -26%
f6     215/s     6%  --    -9%   -11%   -16%   -22%
f3     237/s    16%    10%  --    -2%    -7%   -14%
f4     240/s    18%    12%     2%  --    -6%   -13%
f5     255/s    25%    18%     8%     6%  --    -8%
f2     277/s    36%    29%    17%    15%     9%  --

答案 13 :(得分:2)

这个问题的又一个解决方案。我需要一个尽可能快的解决方案。这意味着列表上只有一次迭代,最好是O(1),用于将数据添加到结果列表之一。这与 sastanin 提供的解决方案非常相似,只是更短:

from collections import deque

def split(iterable, function):
    dq_true = deque()
    dq_false = deque()

    # deque - the fastest way to consume an iterator and append items
    deque((
      (dq_true if function(item) else dq_false).append(item) for item in iterable
    ), maxlen=0)

    return dq_true, dq_false

然后,您可以按以下方式使用该功能:

lower, higher = split([0,1,2,3,4,5,6,7,8,9], lambda x: x < 5)

selected, other = split([0,1,2,3,4,5,6,7,8,9], lambda x: x in {0,4,9})

如果您对生成的deque对象不满意,您可以轻松将其转换为listset,无论您喜欢什么(例如list(lower))。转换速度要快得多,直接构建列表。

此方法保持项目的顺序以及任何重复项。

答案 14 :(得分:1)

有时你不需要列表的另一半。 例如:

import sys
from itertools import ifilter

trustedPeople = sys.argv[1].split(',')
newName = sys.argv[2]

myFriends = ifilter(lambda x: x.startswith('Shi'), trustedPeople)

print '%s is %smy friend.' % (newName, newName not in myFriends 'not ' or '')

答案 15 :(得分:1)

要获得性能,请尝试itertools

  

itertools模块标准化了一组核心快速,内存有效的工具,这些工具本身或组合使用。它们共同组成了一个“迭代器代数”,可以在纯Python中简洁有效地构建专用工具。

请参阅itertools.ifilter或imap。

  

itertools.ifilter(谓词,可迭代)

     

创建一个迭代器,用于过滤来自iterable的元素,仅返回谓词为True的元素

答案 16 :(得分:1)

受@ gnibbler great (but terse!) answer的启发,我们可以应用该方法映射到多个分区:

from collections import defaultdict

def splitter(l, mapper):
    """Split an iterable into multiple partitions generated by a callable mapper."""

    results = defaultdict(list)

    for x in l:
        results[mapper(x)] += [x]

    return results

然后可以按如下方式使用splitter

>>> l = [1, 2, 3, 4, 2, 3, 4, 5, 6, 4, 3, 2, 3]
>>> split = splitter(l, lambda x: x % 2 == 0)  # partition l into odds and evens
>>> split.items()
>>> [(False, [1, 3, 3, 5, 3, 3]), (True, [2, 4, 2, 4, 6, 4, 2])]

这适用于两个以上具有更复杂映射的分区(以及迭代器):

>>> import math
>>> l = xrange(1, 23)
>>> split = splitter(l, lambda x: int(math.log10(x) * 5))
>>> split.items()
[(0, [1]),
 (1, [2]),
 (2, [3]),
 (3, [4, 5, 6]),
 (4, [7, 8, 9]),
 (5, [10, 11, 12, 13, 14, 15]),
 (6, [16, 17, 18, 19, 20, 21, 22])]

或使用字典进行映射:

>>> map = {'A': 1, 'X': 2, 'B': 3, 'Y': 1, 'C': 2, 'Z': 3}
>>> l = ['A', 'B', 'C', 'C', 'X', 'Y', 'Z', 'A', 'Z']
>>> split = splitter(l, map.get)
>>> split.items()
(1, ['A', 'Y', 'A']), (2, ['C', 'C', 'X']), (3, ['B', 'Z', 'Z'])]

答案 17 :(得分:1)

bad = []
good = [x for x in mylist if x in goodvals or bad.append(x)]

append返回None,因此可以正常工作。

答案 18 :(得分:0)

这是最快的方法。

它使用if else(类似于dbr的答案),但首先创建一个集合。一组将操作次数从O(m * n)减少到O(log m)+ O(n),从而使速度提高了45%以上。

good_list_set = set(good_list)  # 45% faster than a tuple.

good, bad = [], []
for item in my_origin_list:
    if item in good_list_set:
        good.append(item)
    else:
        bad.append(item)

短一点:

good_list_set = set(good_list)  # 45% faster than a tuple.

good, bad = [], []
for item in my_origin_list:
    out = good if item in good_list_set else bad
    out.append(item)

基准测试结果

filter_BJHomer                  80/s       --   -3265%   -5312%   -5900%   -6262%   -7273%   -7363%   -8051%   -8162%   -8244%
zip_Funky                       118/s    4848%       --   -3040%   -3913%   -4450%   -5951%   -6085%   -7106%   -7271%   -7393%
two_lst_tuple_JohnLaRoy         170/s   11332%    4367%       --   -1254%   -2026%   -4182%   -4375%   -5842%   -6079%   -6254%
if_else_DBR                     195/s   14392%    6428%    1434%       --    -882%   -3348%   -3568%   -5246%   -5516%   -5717%
two_lst_compr_Parand            213/s   16750%    8016%    2540%     967%       --   -2705%   -2946%   -4786%   -5083%   -5303%
if_else_1_line_DanSalmo         292/s   26668%   14696%    7189%    5033%    3707%       --    -331%   -2853%   -3260%   -3562%
tuple_if_else                   302/s   27923%   15542%    7778%    5548%    4177%     343%       --   -2609%   -3029%   -3341%
set_1_line                      409/s   41308%   24556%   14053%   11035%    9181%    3993%    3529%       --    -569%    -991%
set_shorter                     434/s   44401%   26640%   15503%   12303%   10337%    4836%    4345%     603%       --    -448%
set_if_else                     454/s   46952%   28358%   16699%   13349%   11290%    5532%    5018%    1100%     469%       --

Python 3.7的完整基准代码(从FunkySayu修改):

good_list = ['.jpg','.jpeg','.gif','.bmp','.png']

import random
import string
my_origin_list = []
for i in range(10000):
    fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
    if random.getrandbits(1):
        fext = random.choice(list(good_list))
    else:
        fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))

    my_origin_list.append((fname + fext, random.randrange(1000), fext))

# Parand
def two_lst_compr_Parand(*_):
    return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]

# dbr
def if_else_DBR(*_):
    a, b = list(), list()
    for e in my_origin_list:
        if e[2] in good_list:
            a.append(e)
        else:
            b.append(e)
    return a, b

# John La Rooy
def two_lst_tuple_JohnLaRoy(*_):
    a, b = list(), list()
    for e in my_origin_list:
        (b, a)[e[2] in good_list].append(e)
    return a, b

# # Ants Aasma
# def f4():
#     l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
#     return [i for p, i in l1 if p], [i for p, i in l2 if not p]

# My personal way to do
def zip_Funky(*_):
    a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
    return list(filter(None, a)), list(filter(None, b))

# BJ Homer
def filter_BJHomer(*_):
    return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list,                                                                             my_origin_list))

# ChaimG's answer; as a list.
def if_else_1_line_DanSalmo(*_):
    good, bad = [], []
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list else bad.append(e)
    return good, bad

# ChaimG's answer; as a set.
def set_1_line(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list_set else bad.append(e)
    return good, bad

# ChaimG set and if else list.
def set_shorter(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        out = good if e[2] in good_list_set else bad
        out.append(e)
    return good, bad

# ChaimG's best answer; if else as a set.
def set_if_else(*_):
    good_list_set = set(good_list)
    good, bad = [], []
    for e in my_origin_list:
        if e[2] in good_list_set:
            good.append(e)
        else:
            bad.append(e)
    return good, bad

# ChaimG's best answer; if else as a set.
def tuple_if_else(*_):
    good_list_tuple = tuple(good_list)
    good, bad = [], []
    for e in my_origin_list:
        if e[2] in good_list_tuple:
            good.append(e)
        else:
            bad.append(e)
    return good, bad

def cmpthese(n=0, functions=None):
    results = {}
    for func_name in functions:
        args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
        t = Timer(*args)
        results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec

    functions_sorted = sorted(functions, key=results.__getitem__)
    for f in functions_sorted:
        diff = []
        for func in functions_sorted:
            if func == f:
                diff.append("--")
            else:
                diff.append(f"{results[f]/results[func]*100 - 100:5.0%}")
        diffs = " ".join(f'{x:>8s}' for x in diff)

        print(f"{f:27s} \t{results[f]:,.0f}/s {diffs}")


if __name__=='__main__':
    from timeit import Timer
cmpthese(1000, 'two_lst_compr_Parand if_else_DBR two_lst_tuple_JohnLaRoy zip_Funky filter_BJHomer if_else_1_line_DanSalmo set_1_line set_if_else tuple_if_else set_shorter'.split(" "))

答案 19 :(得分:0)

const onClick = alert("hello");
console.log(onClick);

条件较长时很好,例如您的示例。读者不必弄清楚负面条件以及它是否能捕获所有其他情况。

答案 20 :(得分:0)

还有一个答案,简短但“邪恶”(用于列表理解副作用)。

digits = list(range(10))
odd = [x.pop(i) for i, x in enumerate(digits) if x % 2]

>>> odd
[1, 3, 5, 7, 9]

>>> digits
[0, 2, 4, 6, 8]

答案 21 :(得分:0)

使用布尔逻辑将数据分配给两个数组

>>> images, anims = [[i for i in files if t ^ (i[2].lower() in IMAGE_TYPES) ] for t in (0, 1)]
>>> images
[('file1.jpg', 33, '.jpg')]
>>> anims
[('file2.avi', 999, '.avi')]

答案 22 :(得分:0)

以前的答案似乎无法满足我所有的四个强迫性打tick:

  1. 尽可能懒惰,
  2. 仅对原始Iterable进行一次评估
  3. 每个项目仅对谓词进行一次评估
  4. 提供漂亮的类型注释(适用于python 3.7)

我的解决方案不是很好,我认为我不建议您使用它,但是这里是:

def iter_split_on_predicate(predicate: Callable[[T], bool], iterable: Iterable[T]) -> Tuple[Iterator[T], Iterator[T]]:
    deque_predicate_true = deque()
    deque_predicate_false = deque()
    
    # define a generator function to consume the input iterable
    # the Predicate is evaluated once per item, added to the appropriate deque, and the predicate result it yielded 
    def shared_generator(definitely_an_iterator):
        for item in definitely_an_iterator:
            print("Evaluate predicate.")
            if predicate(item):
                deque_predicate_true.appendleft(item)
                yield True
            else:
                deque_predicate_false.appendleft(item)
                yield False
    
    # consume input iterable only once,
    # converting to an iterator with the iter() function if necessary. Probably this conversion is unnecessary
    shared_gen = shared_generator(
        iterable if isinstance(iterable, collections.abc.Iterator) else iter(iterable)
    )
    
    # define a generator function for each predicate outcome and queue
    def iter_for(predicate_value, hold_queue):
        def consume_shared_generator_until_hold_queue_contains_something():
            if not hold_queue:
                try:
                    while next(shared_gen) != predicate_value:
                        pass
                except:
                    pass
        
        consume_shared_generator_until_hold_queue_contains_something()
        while hold_queue:
            print("Yield where predicate is "+str(predicate_value))
            yield hold_queue.pop()
            consume_shared_generator_until_hold_queue_contains_something()
    
    # return a tuple of two generators  
    return iter_for(predicate_value=True, hold_queue=deque_predicate_true), iter_for(predicate_value=False, hold_queue=deque_predicate_false)

通过以下测试,我们从打印语句中获得以下输出:

t,f = iter_split_on_predicate(lambda item:item>=10,[1,2,3,10,11,12,4,5,6,13,14,15])
print(list(zip(t,f)))
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# Evaluate predicate.
# Yield where predicate is True
# Yield where predicate is False
# [(10, 1), (11, 2), (12, 3), (13, 4), (14, 5), (15, 6)]

答案 23 :(得分:0)

基于生成器的版本,如果您可以忍受原始列表的一次或两次颠倒。

设置...

random.seed(1234)
a = list(range(10))
random.shuffle(a)
a
[2, 8, 3, 5, 6, 4, 9, 0, 1, 7]

还有分裂...

(list((a.pop(j) for j, y in [(len(a)-i-1, x) for i,x in enumerate(a[::-1])] if y%2 == 0))[::-1], a)
([2, 8, 6, 4, 0], [3, 5, 9, 1, 7])
  1. 位置和元组的另一个元组列表以相反的顺序构建。
  2. 在包裹式生成器中,将针对谓词测试每个元素(此处测试是否为偶数),如果为True,则使用先前计算的位置弹出该元素。我们正在沿列表向后移动,因此弹出元素不会改变靠近列表开头的位置。
  3. 包装列表()会评估生成器,最终反转[::-1]将元素按正确的顺序放回去。
  4. 原始列表“ a”现在仅包含谓词为False的元素。

答案 24 :(得分:0)

我求助于 numpy 来解决这个问题,以限制行数并使其成为一个简单的函数。

我能够满足条件,将列表分成两部分,使用 np.where 分离出一个列表。这适用于数字,但我相信可以使用字符串和列表进行扩展。

这是……

from numpy import where as wh, array as arr

midz = lambda a, mid: (a[wh(a > mid)], a[wh((a =< mid))])
p_ = arr([i for i in [75, 50, 403, 453, 0, 25, 428] if i])
high,low = midz(p_, p_.mean())

答案 25 :(得分:0)

清晰快速

这个列表理解简单易读。正是 OP 所要求的。

set_good_vals = set(good_vals)    # Speed boost.
good = [x for x in my_list if x in set_good_vals]
bad = [x for x in my_list if x not in set_good_vals]

我更喜欢单个列表理解而不是两个,但与发布的许多答案(其中一些非常巧妙)不同,它可读且清晰。它也是页面上最快的答案之一。

唯一[稍微]快的答案是:

set_good_vals = set(good_vals)
good, bad = [], []
for item in my_list:
    _ = good.append(item) if item in set_good_vals else bad.append(item)
    

...以及它的变体。 (见我的另一个答案)。但我发现第一种方式更优雅,而且速度几乎一样快。

答案 26 :(得分:0)

如果列表由组和间歇性分隔符组成,则可以使用:

def split(items, p):
    groups = [[]]
    for i in items:
        if p(i):
            groups.append([])
        groups[-1].append(i)
    return groups

用法:

split(range(1,11), lambda x: x % 3 == 0)
# gives [[1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

答案 27 :(得分:0)

例如,按偶数和奇数分割列表

arr = range(20)
even, odd = reduce(lambda res, next: res[next % 2].append(next) or res, arr, ([], []))

或一般而言:

def split(predicate, iterable):
    return reduce(lambda res, e: res[predicate(e)].append(e) or res, iterable, ([], []))

优势:

  • 最可行的方式
  • 谓词对每个元素仅应用一次

缺点

  • 需要有关功能编程范例的知识

答案 28 :(得分:0)

不确定这是否是一种好方法,但也可以通过这种方式完成

IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi')]
images, anims = reduce(lambda (i, a), f: (i + [f], a) if f[2] in IMAGE_TYPES else (i, a + [f]), files, ([], []))

答案 29 :(得分:0)

如果您不介意使用外部库,我知道其中有两个本能地执行此操作:

>>> files = [ ('file1.jpg', 33, '.jpg'), ('file2.avi', 999, '.avi')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
  • iteration_utilities.partition

    >>> from iteration_utilities import partition
    >>> notimages, images = partition(files, lambda x: x[2].lower() in IMAGE_TYPES)
    >>> notimages
    [('file2.avi', 999, '.avi')]
    >>> images
    [('file1.jpg', 33, '.jpg')]
    
  • more_itertools.partition

    >>> from more_itertools import partition
    >>> notimages, images = partition(lambda x: x[2].lower() in IMAGE_TYPES, files)
    >>> list(notimages)  # returns a generator so you need to explicitly convert to list.
    [('file2.avi', 999, '.avi')]
    >>> list(images)
    [('file1.jpg', 33, '.jpg')]
    

答案 30 :(得分:0)

溶液

[even, odd] = separate(
    lambda x: bool(x % 2),
    [1, 2, 3, 4, 5])
print(list(even) == [2, 4])
print(list(odd) == [1, 3, 5])

测试

{{1}}

答案 31 :(得分:0)

我采用2遍方法,将谓词的评估与过滤列表分开:

def partition(pred, iterable):
    xs = list(zip(map(pred, iterable), iterable))
    return [x[1] for x in xs if x[0]], [x[1] for x in xs if not x[0]]

有什么好处,性能方面(除了在pred的每个成员上只评估iterable一次)之外,是它将大量逻辑移出解释器并进入高度 - 优化的迭代和映射代码。这可以加速迭代长迭代,如in this answer所述。

表达方面,它利用了理解和映射这样的表达习语。

答案 32 :(得分:0)

这里已经有很多解决方案,但另一种方法是 -

anims = []
images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]

只在列表上迭代一次,看起来更加pythonic,因此对我来说是可读的。

>>> files = [ ('file1.jpg', 33L, '.jpg'), ('file2.avi', 999L, '.avi'), ('file1.bmp', 33L, '.bmp')]
>>> IMAGE_TYPES = ('.jpg','.jpeg','.gif','.bmp','.png')
>>> anims = []
>>> images = [f for f in files if (lambda t: True if f[2].lower() in IMAGE_TYPES else anims.append(t) and False)(f)]
>>> print '\n'.join([str(anims), str(images)])
[('file2.avi', 999L, '.avi')]
[('file1.jpg', 33L, '.jpg'), ('file1.bmp', 33L, '.bmp')]
>>>

答案 33 :(得分:0)

如果您只关心上述某些方法(甚至是您自己的方法)在单个函数中使用语义只需要两行代码,那么:

def part_with_predicate(l, pred):
    return [i for i in l if pred(i)], [i for i in l if not pred(i)]

它不是一种惰性eval方法,它在列表中迭代两次,但它允许您在一行代码中对列表进行分区。

答案 34 :(得分:0)

如果你坚持聪明,你可以采取Winden的解决方案,只是有点虚假的聪明:

def splay(l, f, d=None):
  d = d or {}
  for x in l: d.setdefault(f(x), []).append(x)
  return d

答案 35 :(得分:-1)

您可以使用Python进行惰性函数编程,如下所示:

partition = lambda l, c: map(
  lambda iii: (i for ii in iii for i in ii),
  zip(*(([], [e]) if c(e) else ([e], []) for e in l)))

函数式编程很优雅,但Python却不然。如果您知道列表中没有None值,也请参见以下示例:

partition = lambda l, c: map(
  filter(lambda x: x is not None, l),
  zip(*((None, e) if c(e) else (e, None) for e in l)))

答案 36 :(得分:-1)

def partition(pred, seq):
  return reduce( lambda (yes, no), x: (yes+[x], no) if pred(x) else (yes, no+[x]), seq, ([], []) )

答案 37 :(得分:-2)

我最喜欢的食谱是:

goodvals = set(goodvals)    # Turbocharges the performance by 55%!  
good, bad = [], []
_ = [good.append(x) if x in goodvals else bad.append(x) for x in mylist]

简单,快速,可读; Python本来就是这样的。

  • goodvals变为set(使用哈希表)而不是 tuple,我们获得超快速查找。
  • 仅检查mylist中的每个项目 一旦。这有助于加快速度。
  • _ =是一种Pythonic方式,用于声明我们有意丢弃列表理解的结果。这不是一个错误。

(基于dansalmo对this回答的评论,因为它似乎应该是它自己的答案。)

编辑:

在我的基准测试中将goodvals转换为设定的涡轮增压性能55%。使用tuple是O(n * m),而将其转换为set是O(log n + m)。

此外,goodvals(即n)只有五个项目。 mylist,(即m),可以有数百个项目。此外,创建一个集合可能在C语言代码的基础上进行了高度优化。

这是我使用的基准代码。它基于从this回答中获取的代码,并已修改为与在Windows 7上运行的Python v3.7.0一起使用。

good_list = ['.jpg','.jpeg','.gif','.bmp','.png']

import random
import string
my_origin_list = []
for i in range(10000):
    fname = ''.join(random.choice(string.ascii_lowercase) for i in range(random.randrange(10)))
    if random.getrandbits(1):
        fext = random.choice(list(good_list))
    else:
        fext = "." + ''.join(random.choice(string.ascii_lowercase) for i in range(3))

    my_origin_list.append((fname + fext, random.randrange(1000), fext))

# Parand
def f1(*_):
    return [e for e in my_origin_list if e[2] in good_list], [e for e in my_origin_list if not e[2] in good_list]

# dbr
def f2(*_):
    a, b = list(), list()
    for e in my_origin_list:
        if e[2] in good_list:
            a.append(e)
        else:
            b.append(e)
    return a, b

# John La Rooy
def f3(*_):
    a, b = list(), list()
    for e in my_origin_list:
        (b, a)[e[2] in good_list].append(e)
    return a, b

# # Ants Aasma
# def f4():
#     l1, l2 = tee((e[2] in good_list, e) for e in my_origin_list)
#     return [i for p, i in l1 if p], [i for p, i in l2 if not p]

# My personal way to do
def f5(*_):
    a, b = zip(*[(e, None) if e[2] in good_list else (None, e) for e in my_origin_list])
    return list(filter(None, a)), list(filter(None, b))

# BJ Homer
def f6(*_):
    return list(filter(lambda e: e[2] in good_list, my_origin_list)), list(filter(lambda e: not e[2] in good_list,                                                                             my_origin_list))

# ChaimG's answer; as a list.
def f7(*_):
    good, bad = [], []
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list else bad.append(e)
    return good, bad

# ChaimG's answer; as a set.
def f8(*_):
    good, bad = [], []
    good_list_set = set(good_list)
    for e in my_origin_list:
        _ = good.append(e) if e[2] in good_list_set else bad.append(e)
    return good, bad

def cmpthese(n=0, functions=None):
    results = {}
    for func_name in functions:
        args = ['%s(range(256))' % func_name, 'from __main__ import %s' % func_name]
        t = Timer(*args)
        results[func_name] = 1 / (t.timeit(number=n) / n) # passes/sec

    functions_sorted = sorted(functions, key=results.__getitem__)
    for f in functions_sorted:
        diff = []
        for func in functions_sorted:
            if func == f:
                diff.append("    --")
            else:
                diff.append("%5.0f%%" % (results[f]/results[func]*100 - 100))
        diffs = " ".join(diff)

        print("%s\t%6d/s %s" % (f, results[f], diffs))


if __name__=='__main__':
    from timeit import Timer
cmpthese(1000, 'f1 f2 f3 f5 f6 f7 f8'.split(" "))