Question

我有一个words嵌套列表，其中有很多重复项，还有一个uniquewords列表，它是列表words的集合。我想找到一个单词的最小起点。例如：

words = [['apple',5],['apple',7],['apple',8],['pear',9], ['pear',4]
         ['grape',6],['baby',3],['baby',2],['baby',87]]

uniquewords = ['apple','pear','grape','baby']

我希望最终结果为：

[0,3,5,6]

我尝试使用enumerate()，因为index()在嵌套列表上不起作用。

a = []
>>> for i in range(len(uniquewords)):
...     for index,sublist in enumerate(words):
...         if uniquewords[i] in sublist:
...             a.append(min(index)) 
... 
Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
TypeError: 'int' object is not iterable

我感觉到这是行不通的，因为我没有告诉python为每个唯一单词附加索引。我怎么到那里？

Answer 1

一种方法是构造一个字典，通过简单的for循环将单词映射到索引，前提是单词不存在于字典中。然后使用map提取uniquewords中每个单词的索引。

d = {}
for idx, (word, _) in enumerate(words):
    if word not in d:
        d[word] = idx

res = list(map(d.__getitem__, uniquewords))

print(res)

[0, 3, 5, 6]

Answer 2

根据我的评论

# dictionary comprehension... make an empty list entry for each word
k = {word[0]:list() for word in words}
# iterate through the list appending the word occurrence list entries
for word in words:
    k[word[0]].append(word[1])

Answer 3

由于此列表的格式，我们可以使用 itertools.groupby ，并为list(g)获取groupby(words, key=lambda x: x[0])中第一项的索引

res = [words.index(list(g)[0]) for k, g in groupby(words, key=lambda x: x[0])]

展开：

res = []
for k, g in groupby(words, key=lambda x: x[0]):
    res.append(words.index(list(g)[0]))

print(res)
# [0, 3, 5, 6]

此外，我们可以在子列表中搜索我们的唯一单词并获取索引，然后中断。这样可以阻止循环为每个关键字获取更多的索引。

res = []
for i in uniquewords:
    for j in words:
        if i in j:
            res.append(words.index(j))
            break
print(res)
# [0, 3, 5, 6]

如何提取嵌套列表中字符串的最小位置

3 个答案: