如果我有一个字母列表,例如:
word = ['W','I','N','E']
并且需要获得长度为3或更小的每个可能的子串序列,例如:
W I N E, WI N E, WI NE, W IN E, WIN E
等
什么是最有效的方法?
现在,我有:
word = ['W','I','N','E']
for idx,phon in enumerate(word):
phon_seq = ""
for p_len in range(3):
if idx-p_len >= 0:
phon_seq = " ".join(word[idx-(p_len):idx+1])
print(phon_seq)
这只是给我下面的,而不是子序列:
W
I
W I
N
I N
W I N
E
N E
I N E
我无法弄清楚如何创建所有可能的序列。
答案 0 :(得分:2)
尝试这种递归算法:
def segment(word):
def sub(w):
if len(w) == 0:
yield []
for i in xrange(1, min(4, len(w) + 1)):
for s in sub(w[i:]):
yield [''.join(w[:i])] + s
return list(sub(word))
# And if you want a list of strings:
def str_segment(word):
return [' '.join(w) for w in segment(word)]
输出:
>>> segment(word)
[['W', 'I', 'N', 'E'], ['W', 'I', 'NE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'N', 'E'], ['WI', 'NE'], ['WIN', 'E']]
>>> str_segment(word)
['W I N E', 'W I NE', 'W IN E', 'W INE', 'WI N E', 'WI NE', 'WIN E']
答案 1 :(得分:2)
因为在三个位置的每一个中都可以有一个空格(在W之后,在I之后和N之后),你可以认为这类似于数字的二进制表示中的1或0位1到2 ^ 3 - 1.
input_word = "WINE"
for variation_number in xrange(1, 2 ** (len(input_word) - 1)):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
编辑:要仅包含3个字符或更少字符序列的变体(在input_word
可能长于4个字符的一般情况下),我们可以排除二进制表示连续包含3个零的情况。 (我们也从较高的数字开始范围,以排除开头会有000的情况。)
for variation_number in xrange(2 ** (len(input_word) - 4), 2 ** (len(input_word) - 1)):
if not '000' in bin(variation_number):
output = ''
for position, letter in enumerate(input_word):
output += letter
if variation_number >> position & 1:
output += ' '
print output
答案 2 :(得分:1)
我对此问题的实施。
#!/usr/bin/env python
# this is a problem of fitting partitions in the word
# we'll use itertools to generate these partitions
import itertools
word = 'WINE'
# this loop generates all possible partitions COUNTS (up to word length)
for partitions_count in range(1, len(word)+1):
# this loop generates all possible combinations based on count
for partitions in itertools.combinations(range(1, len(word)), r=partitions_count):
# because of the way python splits words, we only care about the
# difference *between* partitions, and not their distance from the
# word's beginning
diffs = list(partitions)
for i in xrange(len(partitions)-1):
diffs[i+1] -= partitions[i]
# first, the whole word is up for taking by partitions
splits = [word]
# partition the word's remainder (what was not already "taken")
# with each partition
for p in diffs:
remainder = splits.pop()
splits.append(remainder[:p])
splits.append(remainder[p:])
# print the result
print splits
答案 3 :(得分:1)
作为替代答案,您可以使用itertools
模块执行该操作,并使用groupby
函数对列表进行分组,并使用combination
创建用于分组键的配对索引列表:(i<=word.index(x)<=j
),最后使用set
获取唯一列表。
另请注意,您可以首先通过此方法获得对索引的唯一组合,当您拥有(i1,j1) and (i2,j2)
对i1==0 and j2==3
和j1==i2
如(0,2) and (2,3)
之类的对时意味着那些切片结果是相同的,你需要删除其中一个。
一站式理解:
subs=[[''.join(i) for i in j] for j in [[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in list(combinations(range(len(word)),2))]]
set([' '.join(j) for j in subs]) # set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
详细演示:
>>> cl=list(combinations(range(len(word)),2))
>>> cl
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
>>> new_l=[[list(g) for k,g in groupby(word,lambda x: i<=word.index(x)<=j)] for i,j in cl]
>>> new_l
[[['W', 'I'], ['N', 'E']], [['W', 'I', 'N'], ['E']], [['W', 'I', 'N', 'E']], [['W'], ['I', 'N'], ['E']], [['W'], ['I', 'N', 'E']], [['W', 'I'], ['N', 'E']]]
>>> last=[[''.join(i) for i in j] for j in new_l]
>>> last
[['WI', 'NE'], ['WIN', 'E'], ['WINE'], ['W', 'IN', 'E'], ['W', 'INE'], ['WI', 'NE']]
>>> set([' '.join(j) for j in last])
set(['WIN E', 'W IN E', 'W INE', 'WI NE', 'WINE'])
>>> for i in set([' '.join(j) for j in last]):
... print i
...
WIN E
W IN E
W INE
WI NE
WINE
>>>
答案 4 :(得分:0)
我认为它可能是这样的: word =“ABCDE” myList = []
for i in range(1, len(word)+1,1):
myList.append(word[:i])
for j in range(len(word[len(word[1:]):]), len(word)-len(word[i:]),1):
myList.append(word[j:i])
print(myList)
print(sorted(set(myList), key=myList.index))
return myList