最优雅的方式来分离基于模式的列表(Python)

时间:2017-06-17 11:44:49

标签: python list

我有一个pandas列,其中列出了用户在每个整个日志记录会话中在移动应用中发布照片的连续日志操作。假设单个列表如下:

my_list = [
      'action_a', 'action_b', 'action_c', 'action_z', 
      'action_j',
      'action_a','action_b', 
      'action_a', 'action_b', 'action_z']

1)action_a - 照片上传的开始

2)action_z - 照片上传结束

3)其他actions_i - action_a和action_z之间可能发生的所有操作。

4)可能存在错误,例如'action_j',它们不在'action_a','action_z'之间,我们不应该将它们考虑在内

5)照片上传过程可能无法完成 - 因此可能会有“action_a”,“action_b”之类的路径。

GOAL =将my_list分隔为以'action_a'开头并以'action_z'结尾或在另一'action_a'之前结束的所有操作路径的子列表。所以结果应该是这样的:

['action_a', 'action_b', 'action_c', 'action_z'] 
['action_a','action_b']
['action_a', 'action_b', 'action_z']

所以目前我正试图解决这个问题:首先我删除了所有my_lists,其中'action_z'的数量大于'action_a'的数量或者没有'action_a'的数量。然后我做到了:

indices_a = [i for i, x in enumerate(my_list) if x == "action_a"]
indices_z = [i for i, x in enumerate(my_list) if x == "action_z"]

if(len(indices_z)<1):
    for i_a,x_a in enumerate(indices_a):
        if (i_a+1 != len(indices_a)):
            indices_z.append(indices_a[i_a+1]-1) 
        else: indices_z.append(len(my_list)-1) 
else:       
    for i_a,x_a in enumerate(indices_a):
        if (i_a+1 != len(indices_a)):
            if (indices_z[i_a] > indices_a[i_a+1] ):
                indices_z.insert(i_a, indices_a[i_a+1]-1)
        else:  indices_z.append(len(my_list)-1) 

res=[]
for i,j in zip(indices_a, indices_z):
    res.append(my_list[i:j+1] )

好像很有效。有什么更好的方法?

6 个答案:

答案 0 :(得分:4)

我试图简化一些事情并提出这个逻辑:

result = []
curr_list = None

for item in my_list:
    if item == 'action_a':
        if curr_list is not None:
            # Only append is there is content
            result.append(curr_list)
        # Create a new list
        curr_list = []

    try:
        # Try to append the current item
        curr_list.append(item)

        if item == 'action_z':
            # Close the current list but don't initialize 
            # a new one until we encounter action_a
            result.append(curr_list)
            curr_list = None
    except AttributeError:
        # This means we haven't encountered action_a yet
        # Just ignore and move on
        pass

if curr_list is not None:
    # Append an "open" list if there is one
    result.append(curr_list)

for item in result:
    print(item)

结果:

['action_a', 'action_b', 'action_c', 'action_z']
['action_a', 'action_b']
['action_a', 'action_b', 'action_z']

答案 1 :(得分:3)

遵守规则:

  • 以a:开始新列表,除非最后一个为空并添加一个
  • 以z结尾:添加到上一个列表,然后开始新列表
  • else:添加到上一个列表,除非它是空的

请注意,如果从wait()代码中删除了action_z,则可以自行添加and sublists[-1]列表。

z

会打印:

sublists=[[]]
for li in my_list:
    if li[-1]=='a':
        if sublists[-1]:
            sublists.append([li])
        else:
            sublists[-1].append(li)
    elif li[-1]=='z' and sublists[-1]:
        sublists[-1].append(li)
        sublists.append([])
    elif sublists[-1]:
        sublists[-1].append(li)

if not sublists[-1]:
   sublists.pop()
如果需要,

[['action_a', 'action_b', 'action_c', 'action_z'], ['action_a', 'action_b'], ['action_a', 'action_b']] 始终可以替换为li[-1]=="[letter]"

答案 2 :(得分:1)

这个问题基于意见。但是,如果功能适用于你的优雅概念,我建议使用某种分区算法,如分组或分区。

这种高阶函数有不同的样式,但基本思想很简单,你得到一个流或元素列表,你提供一个函数,告诉算法一个元素是否应该被认为是一个元素新列表开始(我称之为分区点)。我个人认为在数据结构中添加一层嵌套看起来更干净。即你创建一个获取列表的函数,它返回一个列表列表。

# function that defines the start of a new sequence
def partition_begin(photo_action):
    return photo_action is 'action_a'

# function that defines the end of a new sequence
def partition_end(photo_action):
    return photo_action is 'action_z'

# get a list of elements and define a starting and stopping function
# and return a list of lists separated by start and stop.
def partition_by(elements, partition_separator, partition_terminator):
    partitioned_stream = []
    for element in elements:
        if partition_separator(element):
            # start a new list and append it to the stream.
            partitioned_stream.append([element])
            continue
        if partition_terminator(element):
            # add element to the last sequence, but start a new list. 
            partitioned_stream[-1].append(element)
            partitioned_stream.append([])
            continue
        # standard append to list.
        partitioned_stream[-1].append(element)
    return partitioned_stream


my_list = [
      'action_a', 'action_b', 'action_c', 'action_z',
      'action_j',
      'action_a','action_b',
      'action_a', 'action_b', 'action_z']

print partition_by(my_list, partition_begin, partition_end)

# [
#   ['action_a', 'action_b', 'action_c', 'action_z'],
#   ['action_j'],
#   ['action_a', 'action_b'],
#   ['action_a', 'action_b', 'action_z'],
#   []
# ]

如果你有函数式编程语言,这会变得更有趣,因为这些算法通常允许你将不同的函数嵌套到算法中。您可能已经注意到此代码在结尾处返回一个空列表,这可能看起来很奇怪,但您可以通过应用列表解析或简单地过滤空元素来消除这一点。

# remove empty elements from a list
non_empty = lambda x: len(x) > 0
filter(non_empty, partition_by(my_list, partition_begin, partition_end))

答案 3 :(得分:0)

您应首先根据&#39; action_a&#39; 的索引拆分列表,然后删除&#39; action_z&#39; 如下:

 import itertools
 def process_actions(actions,action_first='action_a',action_last='action_z'):
  '''Split the list based on action_a'''
  _actions = [[action_first]+list(g) for k,g in itertools.groupby(l,lambda x:x==action_first) if not k]
  '''Remove all actions appearing after action_z'''
  _actions = [x[0:x.index(action_last)+1] if action_last in x else x for x in _actions ]
  return _actions

答案 4 :(得分:0)

通过更改问题的表示,我认为您可以简化解决方案。

如何将数据表示从文本更改为数字并将地图保存在字典中? 类似于:

my_dict = {
  'action_a': 0,
  'action_b': 1,
  'action_c': 2,
  'action_z': 255, 
  'action_j': -1 ## and all the other values you want to dump
}

然后,您可以将问题分为两部分:

  • 过滤掉所有负值
  • 将列表分解为升序子列表。

第二部分解决了这样的问题:

[[z for z in y]
for x,y in 
itertools.groupby(
    itertools.zip_longest(
        map(
            lambda x: x[1]-x[0] > 0, 
            itertools.zip_longest(
                num_list,num_list[1:],fillvalue=0)
            ),num_list,
        fillvalue=True),
    lambda x: x[0])]

其中num_list是整数列表&gt; = 0

  • 计算列表中元素对之间的运行差异,如果&gt;则标记为True。 0
  • 将其与原始列表一起拉链
  • 按True分组
  • 取消群组操作

从返回的最终列表中,您需要进行更多处理,例如选择对来获取完整列表,但这应该会产生可行的解决方案。 它可以写得更优雅,但我希望这有帮助

答案 5 :(得分:0)

这是一个函数,它在参数first中的任何项目上开始一个新的子列表,并结束last中参数中任何项目的子列表。它返回一个生成器以提高效率。

from itertools import chain
from collections import Iterable

def list_breaker(input_list, first, last, keep_orphans=False):
    '''
    Breaks a list, `x`, into a list of sub elements.

    input_list : list
        The list of items to be split into sublists
    first : item or list of items
        The item/items that identify the start of a new sublist
    last : item or list of items
        The item/items that identify the end of a sublist
    keep_orphans : bool
        When `True`, all sublists are returned.  When `False`,
        only sublists that have a zeroth element in `first` or 
        a last element in `last` are returned.
    '''
    # convert inputs to lists
    if isinstance(first, str) or not isinstance(first, Iterable):
        first = [first]
    if isinstance(last, str) or not isinstance(last, Iterable):
        last = [last]

    # find the places to break the list.  
    breaks = [(lambda i,x: i if x in first else i+1)(i,x) 
              for i,x in enumerate(input_list) if x in chain(first, last)] + [None]
    # slice the list according to the breaks
    for i in range(len(breaks)-1):
        out = input_list[slice(*breaks[i:i+2])] 
        if keep_orphans and out:
            yield out
        if not keep_orphans and out:
            if out[0] in first or out[-1] in last:
                yield out

试验:

# note I added an additional action to the end
my_list = [
      'action_a', 'action_b', 'action_c', 'action_z', 'action_j',
      'action_a','action_b', 'action_a', 'action_b', 'action_z', 'action_d']

list_breaker(my_list, 'action_a', 'action_z')
# returns:
[['action_a', 'action_b', 'action_c', 'action_z'],
 ['action_a', 'action_b'],
 ['action_a', 'action_b', 'action_z']]

list_breaker(my_list, 'action_a', 'action_z', True)
# returns:
[['action_a', 'action_b', 'action_c', 'action_z'],
 ['action_j'],
 ['action_a', 'action_b'],
 ['action_a', 'action_b', 'action_z'],
 ['action_d']]

遇到'action_a''action_c'时,请开始新的列表。

list(list_breaker(my_list, ['action_a', 'action_c'], 'action_z'))
# returns:
[['action_a', 'action_b'],
 ['action_c', 'action_z'],
 ['action_a', 'action_b'],
 ['action_a', 'action_b', 'action_z']]

遇到'action_z''action_c'时结束子列表。

list(list_breaker(my_list, 'action_a', ['action_c', 'action_z']))
# returns:
[['action_a', 'action_b', 'action_c'],
 ['action_z'],
 ['action_a', 'action_b'],
 ['action_a', 'action_b', 'action_z']]