在Python中获取列表的结构化元素的索引

时间:2014-04-04 08:30:04

标签: python list

我有一个如下所示的列表:

mylist = ['name','mem','g1','g2','g3','foo','bar','qux','zoo','name','mem','foo','bar','qux','zoo']

我们可以看到上面的字符串被分成两部分,由'name','mem'

分隔

我想要做的是获取两个列表,其中每个列表在mylist中包含foo...zoo的索引。 导致这个

firstpart_vals_id = [5,6,7,8]
secondpart_vals_id = [11,12,13,14]

我如何在Python中实现这一目标?

mylist中的所有内容都是固定的,但foo....zoo的数量可能不同,但foo....zoo部分的长度和内容对于两个部分是相同的(对称)。

更新:我尝试使用正则表达式解决方案。

>>> from itertools import groupby 
>>> import re 
>>> mj = re.compile(r'^val(\d+)$') 
>>> mylist = ['name','mem','g1','g2','g3','val1','val2','val3','val4','name','mem','val1','val2','val3','val4']
>>> [[x[0] for x in g] for k, g in groupby(enumerate(mylist), key= lambda x: mj.search(x[1].mj)) if k]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'str' object has no attribute 'mj'

2 个答案:

答案 0 :(得分:4)

您可以使用itertools.groupby

>>> from itertools import groupby
>>> mylist = ['name','mem','g1','g2','g3','val1','val2','val3','valN','name','mem','val1','val2','val3','valN']
>>> [[x[0] for x in g] for k, g in groupby(
                    enumerate(mylist), key= lambda x:x[1].startswith('val')) if k]
[[5, 6, 7, 8], [11, 12, 13, 14]]

请注意,我在这里使用了一个简单的str.startswith条件,如果需要,可以用正则表达式替换它。

更新

使用正则表达式:

import re
mylist = ['name','mem','g1','g2','g3','val1','val2','val3','val1','name','mem','val1','val2','val3','val4']
mj = re.compile(r'^val\d+$')
print [[x[0] for x in g] for k, g in groupby(
                     enumerate(mylist), key=lambda x: bool(mj.search(x[1]))) if k]

输出:

[[5, 6, 7, 8], [11, 12, 13, 14]]

答案 1 :(得分:1)

您可以使用列表推导来执行所需的基本步骤(序列的映射和过滤)。可能有几种方法可以完成工作,下面的代码是单向的(N.B.我还没有测试过)。

# first find every occurence of "name", we just ignore "map". 
name_indices = [i for (i, s) in enumerate(mylist) if s == 'name']
name_indices.sort()  # probably redunant, but we are going to rely on sorting later.

# do something similar, but now we don't care about ordering so use a set.
# you can use some other sequence type if you prefer.  Of course we can use
# any condition we choose. not just s.startswith()
val_indices = set(i for (i, s) in enumerate(mylist) if s.startswith('val'))


# we want to build a dictionary of Name index to all value indices following it.
nv_map = {}
for ni, ni_next in zip(name_indices[0:-1], indices[1:]):
    # ni should be a name index, an ni_next should the next higher one.
    # so insert all val_indices in that range into an element of nv_map
    nv_map[ni] = set(i for i in val_indices if i >= ni and i < ni_next)

因此,我们希望nv_map的结果为

{
    0 : {5,6,7,8},
    9 : {11,12,13,14}
}