Question

我正在尝试列出一个排列的列表，其中顺序在前两个变量和后两个变量之间很重要，而在这两个组之间却不重要。我已经能够得到这样的东西：

>>>[p for p in itertools.product([1, 2],[1, 2],[5, 6], [5, 6])]

   [(1, 1, 5, 5), (1, 1, 5, 6), (1, 1, 6, 5), (1, 1, 6, 6), (1, 2, 5, 5),
    (1, 2, 5, 6), (1, 2, 6, 5), (1, 2, 6, 6), (2, 1, 5, 5), (2, 1, 5, 6), 
    (2, 1, 6, 5), (2, 1, 6, 6), (2, 2, 5, 5), (2, 2, 5, 6), (2, 2, 6, 5), 
    (2, 2, 6, 6)]

我想得到的东西没有重复的切换顺序，但是保留重复的变量，像这样：

   [(1, 1, 5, 5), (1, 1, 5, 6), (1, 1, 6, 6), (1, 2, 5, 5),(1, 2, 5, 6), 
    (1, 2, 6, 6), (2, 2, 5, 5), (2, 2, 5, 6), (2, 2, 6, 6)]

似乎应该有一种简单的方法来做到这一点，我只是还没有想到（我主要使用R编写代码，并且刚刚开始研究Python 3）。

Answer 1

与使用set来使product的结果唯一化相比，您可以结合使用product和itertools.combinations_with_replacement来直接产生您关心的结果：

from itertools import product, combinations_with_replacement as comb_repl

[p1 + p2 for p1, p2 in product(comb_repl([1, 2], 2), comb_repl([5, 6], 2))]

这将产生所需的准确输出，没有重复（因此不需要单独的重复数据删除步骤）。

请注意，除了传递给product的第一个可迭代对象外，product会一直缓存可迭代对象，因此在这种情况下，comb_repl([5, 6], 2)的完整输出最终将是存储在内存中以及您实际需要的所有四个tuple中。在这种情况下，这很好，但是如果组合的集合大得多，那么您可能会更喜欢重新计算组合，因此您只为最终结果支付内存，而不是为combinations_with_replacement输出的完整集合第二个迭代器。由于product缓存，您必须避免使用多for列表理解来重复创建第二个combinations_with_replacement迭代器：

# Also switched argument to second comb_repl to a tuple, so argument is not repeatedly rebuilt;
# slightly less readable due to profusion of parens, but equivalent behavior
[p1 + p2 for p1 in comb_repl([1, 2], 2) for p2 in comb_repl((5, 6), 2)]

在测试中，当不关心内存时，嵌套循环listcomp比使用product稍慢（product将更多工作推到C层，并且仅创建{{1 }}，而不是第一个迭代器的每个输出一个加一；在第二个及后续遍中，它对输出的缓存combinations_with_replacement进行迭代，这与Python一样快），因此，如果您知道参数将不会使用tuple可获得最佳性能。

在所有情况下，除非您真的需要一个已实现的product，否则最好使用生成器表达式（genexpr），因为genexpr仅在被询问时才产生结果，而不会不必将它们全部存储在内存中；您可以循环播放一次，然后用尽，但是在许多情况下，只需循环一次就可以了。使用genexpr，用法如下：

product

不再复杂，对于更大的组合，也不会耗尽内存。

Answer 2

要删除重复项，可以使用内置的set。

>>> set([(1, 1, 5, 6), (1, 2, 6, 5), (1, 1, 5, 6)])
{(1, 1, 5, 6), (1, 2, 6, 5)}

然后，您可以使用内置的sorted。

>>> sorted([1,2,3,1,0])
[0, 1, 1, 2, 3]

sorted还提供了可选的key关键字参数，因此您可以使用类似的

sorted(set(cross_products), key=lambda item: (item[:2], item[-2:]))

这将根据元组中的前两项和后两项对字段进行排序。

Answer 3

set([tuple(sorted(p)) for p in itertools.product([1, 2],[1, 2],[5, 6], [5, 6])])
{(1, 1, 5, 5),
 (1, 1, 5, 6),
 (1, 1, 6, 6),
 (1, 2, 5, 5),
 (1, 2, 5, 6),
 (1, 2, 6, 6),
 (2, 2, 5, 5),
 (2, 2, 5, 6),
 (2, 2, 6, 6)}

您只需在所有内容周围添加函数list()即可返回列表

如何创建一个迭代列表，其中某些变量是独立的而某些变量是相关的？

3 个答案: