我有两个列表: a 和 b 。
a 是包含三个或更多字符串的列表,而 b 是分隔符的列表。
我需要生成 a 的所有可能组合,然后将结果与 b 的所有可能组合“合并”(请参见示例以更好地理解)。< / p>
我最终使用了以下代码:
from itertools import permutations, combinations, product
a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]
output = []
for com in combinations(b, len(a) - 1):
for per in product(com, repeat=len(a) - 1):
for ear_per in permutations(a):
out = ''.join(map(''.join, zip(list(ear_per[:-1]), per))) + list(ear_per)[-1]
output.append(out)
# For some reason the algorithm is generating duplicates
output = list(dict.fromkeys(output))
for o in output:
print o
这是输出的样本(正确,在这种情况下正是我所需要的):
timestamp.customfilename
filenamecustom.timestamp
custom_filenametimestamp
timestamp_custom_filename
timestamp-filename.custom
custom_filename-timestamp
filename.timestamp-custom
. . .
filename.custom.timestamp
filename-customtimestamp
custom-timestamp_filename
filename_custom-timestamp
filename.timestampcustom
timestampcustom-filename
custom-timestamp.filename
filenamecustom_timestamp
timestamp.custom_filename
custom.timestampfilename
timestampfilename.custom
customfilename_timestamp
filenametimestamp-custom
custom-filenametimestamp
timestampfilename-custom
timestamp-custom-filename
custom.filenametimestamp
customfilenametimestamp
timestampfilename_custom
custom_filename.timestamp
custom-timestamp-filename
custom-timestampfilename
filename_timestamp.custom
. . .
filename.custom-timestamp
timestamp_filenamecustom
custom_timestampfilename
timestamp.custom.filename
timestamp.filename-custom
filename-custom-timestamp
customfilename.timestamp
filename_timestamp_custom
timestamp_filename.custom
customtimestampfilename
filenamecustomtimestamp
custom.timestamp_filename
filename_customtimestamp
. . .
timestamp-customfilename
filename_custom.timestamp
此算法有两个主要问题:
它会生成一些重复的行,因此我总是需要删除它们(在更大的数据集上速度很慢)
if len(a) > len(b) + 2
脚本无法启动。在这种情况下,我需要重复分隔符以覆盖 a 中包含的单词之间的len(a) - 1
可用空间。
答案 0 :(得分:1)
这可能是一种解决方案。它需要与(3*2 = 6)
的{{1}}交错的a product
的排列,以获得总共(2 at a time here, 4*4 == 16)
个结果。
6 * 16 == 96
答案 1 :(得分:0)
您可能正在寻找这个:
a = ["filename", "timestamp", "custom"]
b = ["_", "-", ".", ""]
count = 0
def print_sequence(sol_words, sol_seps):
global count
print("".join([sol_words[i] + sep for (i, sep) in enumerate(sol_seps)] + [sol_words[-1]]))
count += 1
def backtrack_seps(sol_words, seps, sol_seps, i):
for (si, sep) in enumerate(seps):
sol_seps[i] = sep
if i == len(sol_words) - 2:
print_sequence(sol_words, sol_seps)
else:
backtrack_seps(sol_words, seps, sol_seps, i + 1)
def bt_for_sep(sol_words, seps):
backtrack_seps(sol_words, seps, [''] * (len(sol_words) - 1), 0)
def backtrack_words(active, words, seps, sol_words, i):
for (wi, word) in enumerate(words):
if active[wi]:
sol_words[i] = word
active[wi] = False
if i == len(words) - 1:
bt_for_sep(sol_words, seps)
else:
backtrack_words(active, words, seps, sol_words, i + 1)
active[wi] = True
backtrack_words([True] * len(a), set(a), set(b), [''] * len(a), 0)
print(count) #96
通常,当您需要枚举一组特定值的所有可能性时,可以使用回溯。回溯的方案始终是相同的,并且在使用分隔符来置换单词后,对分隔符重复该方案。
编辑
问题的第二部分,描述为查找分隔符的组合,实际上是查找所有具有重复的处置的问题。这样做比我想的要简单:在seps
中选择一个分隔符后,不必删除它(在这种情况下也要禁用它),而只需将其保留。