Itertools-合并两个列表以获取所有可能的组合

时间:2019-09-10 19:05:53

标签: python combinations permutation itertools

我有两个列表: a b

a 是包含三个或更多字符串的列表,而 b 是分隔符的列表。

我需要生成 a 的所有可能组合,然后将结果与 b 的所有可能组合“合并”(请参见示例以更好地理解)。< / p>

我最终使用了以下代码:

from itertools import permutations, combinations, product

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]

output = []

for com in combinations(b, len(a) - 1):
    for per in product(com, repeat=len(a) - 1):
        for ear_per in permutations(a):
            out = ''.join(map(''.join, zip(list(ear_per[:-1]), per))) + list(ear_per)[-1]
            output.append(out)

# For some reason the algorithm is generating duplicates
output = list(dict.fromkeys(output))

for o in output:
    print o

这是输出的样本(正确,在这种情况下正是我所需要的):

timestamp.customfilename
filenamecustom.timestamp
custom_filenametimestamp
timestamp_custom_filename
timestamp-filename.custom
custom_filename-timestamp
filename.timestamp-custom
. . .
filename.custom.timestamp
filename-customtimestamp
custom-timestamp_filename
filename_custom-timestamp
filename.timestampcustom
timestampcustom-filename
custom-timestamp.filename
filenamecustom_timestamp
timestamp.custom_filename
custom.timestampfilename
timestampfilename.custom
customfilename_timestamp
filenametimestamp-custom
custom-filenametimestamp
timestampfilename-custom
timestamp-custom-filename
custom.filenametimestamp
customfilenametimestamp
timestampfilename_custom
custom_filename.timestamp
custom-timestamp-filename
custom-timestampfilename
filename_timestamp.custom
. . .
filename.custom-timestamp
timestamp_filenamecustom
custom_timestampfilename
timestamp.custom.filename
timestamp.filename-custom
filename-custom-timestamp
customfilename.timestamp
filename_timestamp_custom
timestamp_filename.custom
customtimestampfilename
filenamecustomtimestamp
custom.timestamp_filename
filename_customtimestamp
. . .
timestamp-customfilename
filename_custom.timestamp

此算法有两个主要问题:

  1. 它会生成一些重复的行,因此我总是需要删除它们(在更大的数据集上速度很慢)

  2. if len(a) > len(b) + 2脚本无法启动。在这种情况下,我需要重复分隔符以覆盖 a 中包含的单词之间的len(a) - 1可用空间。

2 个答案:

答案 0 :(得分:1)

这可能是一种解决方案。它需要与(3*2 = 6)的{​​{1}}交错的a product的排列,以获得总共(2 at a time here, 4*4 == 16)个结果。

6 * 16 == 96

答案 1 :(得分:0)

您可能正在寻找这个:

a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]
count = 0

def print_sequence(sol_words, sol_seps):
  global count 
  print("".join([sol_words[i] + sep for (i, sep) in enumerate(sol_seps)] + [sol_words[-1]]))
  count += 1

def backtrack_seps(sol_words, seps, sol_seps, i):
  for (si, sep) in enumerate(seps):
    sol_seps[i] = sep

    if i == len(sol_words) - 2:
      print_sequence(sol_words, sol_seps)
    else:
      backtrack_seps(sol_words, seps, sol_seps, i + 1)

def bt_for_sep(sol_words, seps):
  backtrack_seps(sol_words, seps, [''] * (len(sol_words) - 1), 0)

def backtrack_words(active, words, seps, sol_words, i):
  for (wi, word) in enumerate(words):
    if active[wi]:
      sol_words[i] = word
      active[wi] = False

      if i == len(words) - 1:
        bt_for_sep(sol_words, seps)
      else:
        backtrack_words(active, words, seps, sol_words, i + 1)

      active[wi] = True

backtrack_words([True] * len(a), set(a), set(b), [''] * len(a), 0)
print(count) #96

通常,当您需要枚举一组特定值的所有可能性时,可以使用回溯。回溯的方案始终是相同的,并且在使用分隔符来置换单词后,对分隔符重复该方案。


编辑

问题的第二部分,描述为查找分隔符的组合,实际上是查找所有具有重复的处置的问题。这样做比我想的要简单:在seps中选择一个分隔符后,不必删除它(在这种情况下也要禁用它),而只需将其保留。