Question

我有很多单词：

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']

我希望能够计算整个列表中[tag]元素之间（和包括）元素的数量。目标是能够看到频率分布。

我可以使用range()开始和停止字符串匹配吗？

Answer 1

首先，查找[tag]的所有索引，相邻索引之间的差异是单词数。

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
indices = [i for i, x in enumerate(my_list) if x == "[tag]"]
nums = []
for i in range(1,len(indices)):
    nums.append(indices[i] - indices[i-1])

查找所有索引的更快方法是使用numpy，如下所示：

import numpy as np
values = np.array(my_list)
searchval = '[tag]'
ii = np.where(values == searchval)[0]
print ii

在相邻索引之间获得差异的另一种方法是使用itertools，

import itertools
diffs = [y-x for x, y in itertools.izip (indices, indices[1:])]

Answer 2

您可以使用.index(value, [start, [stop]])搜索列表。

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']
my_list.index('[tag'])   # will return 0, as it occurs at the zero-eth element
my_list.index('[/tag]')  # will return 6

这将为您提供第一组长度，然后在下一次迭代中，您只需记住最后一个结束标记的索引，并将其用作起点，再加上1

my_list.index('[tag]', 7)     # will return 7
my_list.index(['[/tag]'), 7)  # will return 11

然后循环执行，直到您到达列表中的最后一个结束标记。还要记住，如果值不存在，.index将引发ValueError，因此您需要在异常发生时处理该异常。

Answer 3

我会选择以下内容，因为OP想要计算实际值。（毫无疑问，他现在已经想出了如何做到这一点。）

i = [k for k, i in enumerate(my_list) if i == '[tag]']
j = [k for k, p in enumerate(my_list) if p == '[/tag]']
for z in zip(i,j):
    print (z[1]-z[0])

Answer 4

这可以让您找到标签之间包含的字数：

MY_LIST = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]',
           'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']


def main():
    ranges = find_ranges(MY_LIST, '[tag]', '[/tag]')
    for index, pair in enumerate(ranges, 1):
        print('Range {}: Start = {}, Stop = {}'.format(index, *pair))
        start, stop = pair
        print('         Size of Range =', stop - start + 1)


def find_ranges(iterable, start, stop):
    range_start = None
    for index, value in enumerate(iterable):
        if value == start:
            if range_start is None:
                range_start = index
            else:
                raise ValueError('a start was duplicated before a stop')
        elif value == stop:
            if range_start is None:
                raise ValueError('a stop was seen before a start')
            else:
                yield range_start, index
                range_start = None

if __name__ == '__main__':
    main()

此示例将打印出以下文本，以便您了解其工作原理：

Range 1: Start = 0, Stop = 6
         Size of Range = 7
Range 2: Start = 7, Stop = 11
         Size of Range = 5
Range 3: Start = 12, Stop = 15
         Size of Range = 4

Answer 5

从this question选择的答案中借用并稍微修改生成器代码：

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']

def group(seq, sep):
    g = []
    for el in seq:
        g.append(el)
        if el == sep:
            yield g
            g = []

counts = [len(x) for x in group(my_list,'[/tag]')]

我更改了他们在该答案中给出的生成器，不在最后返回空列表，并在列表中包含分隔符，而不是将其放在下一个列表中。请注意，这假设始终存在匹配的“[tag]＆＃39; ＆＃39; [/ tag＆＃39;]按顺序对，并且列表中的所有元素都在一对之间。

运行此后，计数将为[7,5,4]

Answer 6

使用列表理解和字符串操作的解决方案。

my_list = ['[tag]', 'there', 'are', 'many', 'words', 'here', '[/tag]', '[tag]', 'some', 'more', 'here', '[/tag]', '[tag]', 'and', 'more', '[/tag]']

# string together your list
my_str = ','.join(mylist)

# split the giant string by tag, gives you a list of comma-separated strings
my_tags = my_str.split('[tag]')

# split for each word in each tag string
my_words = [w.split(',') for w in my_tags]

# count up each list to get a list of counts for each tag, adding one since the first split removed [tag]
my_cnt = [1+len(w) for w in my_words]

一行：

# all as one list comprehension starting with just the string
[1+len(t.split(',')) for t in my_str.split('[tag]')]

Python - 计算指定值范围内的列表元素

6 个答案: