Question

我从["ONE","TWO","THREE","FOUR"]等单词列表开始。

稍后，我加入列表以创建一个字符串："ONETWOTHREEFOUR"。我在查看这个字符串时会做一些事情并得到一个索引列表，比如说[6,7,8,0,4]（它映射到那个字符串上给我“THROW”这个词，尽管正如评论中指出的那样与我的问题无关）。

现在我想知道原始列表中的哪些项目给了我用来表达我的信件。我知道我使用了加入字符串中的字母[6,7,8,0,4]。根据字符串索引列表，我想要输出{0,1,2}，因为我使用了原始列表中除"FOUR"之外的每个单词的字母。

到目前为止我尝试过：

wordlist = ["ONE","TWO","THREE","FOUR"]
stringpositions = [6,7,8,0,4]
wordlengths = tuple(len(w) for w in wordlist) #->(3, 3, 5, 4)
wordstarts = tuple(sum(wordlengths[:i]) for i in range(len(wordlengths))) #->(0, 3, 6, 11)

words_used = set()
for pos in stringpositions:
    prev = 0
    for wordnumber,wordstart in enumerate(wordstarts):            
        if pos < wordstart:
            words_used.add(prev)
            break
        prev = wordnumber

看起来非常啰嗦。对我来说，最好的（和/或大多数Pythonic）方法是什么？

Answer 1

这是最简单的方法。如果您想要更节省空间，可能需要使用某种二叉搜索树

wordlist = ["ONE","TWO","THREE","FOUR"]
top = 0
inds = {}
for i,word in enumerate(wordlist):
    for k in range(top, top+len(word)):
        inds[k] = i
    top += len(word)

#do some magic
L = [6,7,8,0,4]
for i in L: print(inds[i])

输出：

如果你想

，你当然可以在输出上调用set()

Answer 2

如澄清in the comments，OP的目标是根据使用的字符串位置来确定使用哪些单词，而不是使用哪些字母 - 所以单词/ substring THROW基本上是不相关的。

这是一个非常简短的版本：

from itertools import chain

wordlist = ["ONE","TWO","THREE","FOUR"]
string = ''.join(wordlist) # "ONETWOTHREEFOUR"
stringpositions = [6,7,8,0,4]

# construct a list that maps every position in string to a single source word    
which_word = list(chain( [ii]*len(w) for ii, w in enumerate(wordlist) ))

# it's now trivial to use which_word to construct the set of words 
# represented in the list stringpositions
words_used = set( which_word[pos] for pos in stringpositions )

print "which_word=", which_word
print "words_used=", words_used

==＆GT;

which_word= [0, 0, 0, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3]
words_used= set([0, 1, 2])

编辑：更新为使用list(itertools.chain(generator))而不是sum(generator, [])，正如评论中@ inspectorG4dget所建议的那样。

将字符串位置映射到列表位置

2 个答案: