Question

这里的新手问题。

我正在寻找各种字符串元素的每一个组合，然后想把它们放到一个列表中。

import itertools 

mydata = [ ]

main = [["car insurance", "auto insurance"], ["insurance"], ["cheap", "budget"],
      ["low cost"], ["quote", "quotes"], ["rate", "rates"], ["comparison"]]

def twofunc(one, two):
    for a, b in itertools.product(main[one], main[two]):
        print a, b

def threefunc(one, two, three):
    for a, b, c in itertools.product(main[one], main[two], main[three]):
        print a, b, c

twofunc(2, 0)  #extremely inefficient to just run these functions over and over. alternative?
twofunc(3, 0)
twofunc(0, 4)
twofunc(0, 5)
twofunc(0, 6)
threefunc(2, 0, 4)
threefunc(3, 0, 4)
threefunc(2, 0, 5)
threefunc(3, 0, 5)
threefunc(2, 0, 6)
threefunc(3, 0, 6)

上面的代码只打印出每个排列，但不会将值附加到列表中。我已尝试过追加方法的各种变化，但仍然没有运气。

任何人都可以帮我将这些值放入mydata列表中。我假设每个字符串必须是一个单独的列表，所以它最终会成为一个列表列表。它应该如下所示，但是我还需要某种方式在将来包含“标签”，或者只是在字符串包含值时使用值。

[["cheap car insurance"],
["cheap auto insurance"],
["budget car insurance"],
["budget auto insurance"],
["low cost car insurance"],
["low cost auto insurance"],
...

因此，最终将结束：1意味着字符串包含单词car / cheap，而0意味着它不包含。我提到这个的原因只是询问列表是否是此任务的正确数据结构。

                         car        cheap 
cheap car insurance       1            1
cheap auto insurance      0            1 
budget car insurance      1            0
budget auto insurance     0            0

任何人都可以提供帮助。

我在R中完成了这个任务，这个任务非常适合，只是想用Python重现它。

Answer 1

要将twofunc和threefunc的return值转换为list，您可以更改return语句，以便返回列表。然后append将结果发送到mydata。以下是twofunc的示例：

def twofunc(one, two):
    for a, b in itertools.product(main[one], main[two]):
        return [a, b]

mydata.append(twofunc(2,0))

那就是说，我不熟悉R，所以显然不知道你可能在那里使用什么数据结构。为了您的既定目标，将其保留在列表中可能会变得复杂。但是，创建一个简单的类来做到这一点应该不会太困难。然后，您可以拥有一个由此类实例化组成的列表。

Answer 2

main = [["car", "auto"], ["insurance"],["cheap", "budget"]]

reduce(lambda x,y: [e+' '+f for e,f in itertools.product(x,y)],main)

或

reduce(lambda x,y: [' '.join([e,f]) for e,f in itertools.product(x,y)],main)

后者应该更快结果是：

['car insurance cheap', 'car insurance budget', 'auto insurance cheap', 'auto insurance budget']

Answer 3

首先，您不需要编写单独的twofunc和threefunc：

def nfunc(strings, indices):
    """call as nfunc(main,(2,0)) or nfunc(main, (2,0,4))"""
    selected_strings=map(lambda index:strings[index], indices)
    return itertools.product(*selected_strings)

请注意，这只会返回可迭代对象，因此它将被懒惰地计算。

现在，我们可以将命令列表中的索引列表拆分为一个列表，就像使用字符串一样，因此您不需要重写任何函数调用来更改索引：

index_list=[(2, 0),(3, 0),...,(2, 0, 4),(3, 0, 4),...,(3, 0, 6)]

要获取延迟结果的列表，您现在可以写：

lazy_results=[nfunc(main, indices) for indices in index_list]
-> [itertools.product at 0xblahblah, itertools.product at 0xblahblah, ...]

强制评估结果：

eager_results=[list(lazy) for lazy in lazy_results]
-> [[("cheap","car insurance"),("budget","car insurance"), ...

将元组拼成字符串：

str_results=[[' '.join(rtuple) for rtuple in result] for result in eager_results]
-> [["cheap car insurance", "budget car insurance", ...

如果您需要，可以添加任意标签：

tag_index_list=[(tag_1,(2,0)), ... (tag_n, (3, 0, 6))]
tag_results=[(tag, nfunc(main, indices)) for (tag, indices) in tag_index_list]

根据您的需要，可能更好地为您的结果集编写一个类;如果您单步执行上面的代码，您将会了解使用嵌套列表和元组的方式。

将字符串附加到列表

3 个答案: