过滤元组列表以包含最大值和最小值

时间:2012-09-06 16:40:02

标签: python

我已经生成了很长的元组列表(格式如下)。列表中的每个元组都将时间作为第一个元素,将事件作为第三个成员。第二个成员始终相同,并从其他类似列表中识别我将要处理的列表。元组有许多不同的第三个元素,每个元素都有不同时间值的多个条目,这是第一个元素。

我试图过滤列表以删除除了每个事件(元组的第三个成员)的时间(元组中的第一项)的最小值和最大值之外的所有值。我尝试使用列表理解但很快就感到困惑。

('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3567', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3600', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3800', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3800', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3800', 'VOLTAGE DEVIATION', 'HORIZ_G .575')
('1.3800', 'VOLTAGE DEVIATION', 'MEDBOWCO 115')
('1.3800', 'VOLTAGE DEVIATION', 'MEDBOWCO 115')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
'1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4267', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4267', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4267', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4267', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4267', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4833', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4833', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4833', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4833', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4833', 'VOLTAGE DEVIATION', 'HIPLN_G .575')

过滤后的结果

('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP2G23.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3600', 'VOLTAGE DEVIATION', 'DIFICULT 230')
('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3800', 'VOLTAGE DEVIATION', '7MIHL G1.575')
('1.3800', 'VOLTAGE DEVIATION', 'HORIZ_G .575')
('1.3800', 'VOLTAGE DEVIATION', 'MEDBOWCO 115')
('1.3800', 'VOLTAGE DEVIATION', 'MEDBOWCO 115')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230')
('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5')
('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5')
'1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4267', 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575')
('1.4833', 'VOLTAGE DEVIATION', 'HIPLN_G .575')`

我正在尝试下面的代码但是我收到了一个错误。我对此很新,所以请告诉我,我做错了什么。代码中的m1是我从findall生成的元组列表。我在代码的顶部导入了ast。

       m1 = re.findall(pattern1,wholefile)
       m1=[ast.literal_eval(t) for t in m1] 
       m1=[(float(a),b,c) for a,b,c in m1] 
       keys=sorted({t[2] for t in m1}) 
       for key in keys: 
           group=filter(lambda t: t[2]==key,m1)
           print '{}:\n\tmax: {}\n\tmin: {}'.format(key,max(group),min(group))

4 个答案:

答案 0 :(得分:4)

将你的元组重构成一个字母可以让生活更轻松。

from collections import defaultdict

d = defaultdict(list)
for t,_,v in your_tuple_list:
     d[v].append(t)

之后,d有一个每个事件的密钥,以及该时间段的关联时间列表。

它看起来像这样(某种程度上):

>>> d['DNLP2G23.575']
['1.3433'....]

现在问题在于找到每个列表的最小值和最大值; min()max()

很容易实现

完成后,您将按照所需的顺序设置数据集;你可以将它转换回元组/列表/等。

如果您热衷于此,可以将列表转换为set,这将消除重复次数,并通过加快最小/最大值来节省您的时间;假设你需要计算一大堆元组。

您还应该将时间投向float - 您可以在主循环中执行此操作:d[v].append(float(t))。这是为了确保最大值和最小值正常工作。

答案 1 :(得分:3)

使用itertools.groupby

>>> import itertools
>>> import operator
>>> results = []
>>> for key, group in itertools.groupby(tuplelist, operator.itemgetter(2)):
...    group = list(group)
...    results.append(min(group))
...    results.append(max(group))
...
>>> pprint.pprint(results)
[('1.3433', 'VOLTAGE DEVIATION', 'DNLP2G23.575'),
 ('1.3467', 'VOLTAGE DEVIATION', 'DNLP2G23.575'),
 ('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575'),
 ('1.3467', 'VOLTAGE DEVIATION', 'DNLP1_G1.575'),
 ('1.3533', 'VOLTAGE DEVIATION', 'DIFICULT 230'),
 ('1.3600', 'VOLTAGE DEVIATION', 'DIFICULT 230'),
 ('1.3600', 'VOLTAGE DEVIATION', '7MIHL G1.575'),
 ('1.3800', 'VOLTAGE DEVIATION', '7MIHL G1.575'),
 ('1.3800', 'VOLTAGE DEVIATION', 'HORIZ_G .575'),
 ('1.3800', 'VOLTAGE DEVIATION', 'HORIZ_G .575'),
 ('1.3800', 'VOLTAGE DEVIATION', 'MEDBOWCO 115'),
 ('1.3800', 'VOLTAGE DEVIATION', 'MEDBOWCO 115'),
 ('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230'),
 ('1.3800', 'VOLTAGE DEVIATION', 'STNDPSVC 230'),
 ('1.3867', 'VOLTAGE DEVIATION', 'MINERS  34.5'),
 ('1.3900', 'VOLTAGE DEVIATION', 'MINERS  34.5'),
 ('1.4233', 'VOLTAGE DEVIATION', 'FT CRK2 34.5'),
 ('1.4267', 'VOLTAGE DEVIATION', 'FT CRK2 34.5'),
 ('1.4800', 'VOLTAGE DEVIATION', 'HIPLN_G .575'),
 ('1.4833', 'VOLTAGE DEVIATION', 'HIPLN_G .575')]

注意:

  1. 按顺序对元组的元素执行最小值/最大值。但是,第一个元素实际上是一个字符串而不是一个浮点数,因此您可能需要将key参数传递给min和max以使其使用不同的值
  2. 仅当分组键的所有相同值都在列表中时才会起作用。在您的示例输出中,情况就是这样,但如果没有,您可能必须先对列表进行排序。

答案 2 :(得分:1)

这是有效的(只要你真的有一个元组列表,第一个值是一个浮点数):

keys=sorted({t[2] for t in tups})
for key in keys:
    group=filter(lambda t: t[2]==key,tups)
    print '{}:\n\tmax: {}\n\tmin: {}'.format(key,max(group),min(group))

打印:

MIHL G1.575:
    max: (1.38, 'VOLTAGE DEVIATION', '7MIHL G1.575')
    min: (1.36, 'VOLTAGE DEVIATION', '7MIHL G1.575')
DIFICULT 230:
    max: (1.36, 'VOLTAGE DEVIATION', 'DIFICULT 230')
    min: (1.3533, 'VOLTAGE DEVIATION', 'DIFICULT 230')
DNLP1_G1.575:
    max: (1.3467, 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
    min: (1.3467, 'VOLTAGE DEVIATION', 'DNLP1_G1.575')
DNLP2G23.575:
    max: (1.3467, 'VOLTAGE DEVIATION', 'DNLP2G23.575')
    min: (1.3433, 'VOLTAGE DEVIATION', 'DNLP2G23.575')
FT CRK2 34.5:
    max: (1.4267, 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
    min: (1.4233, 'VOLTAGE DEVIATION', 'FT CRK2 34.5')
HIPLN_G .575:
    max: (1.4833, 'VOLTAGE DEVIATION', 'HIPLN_G .575')
    min: (1.48, 'VOLTAGE DEVIATION', 'HIPLN_G .575')
HORIZ_G .575:
    max: (1.38, 'VOLTAGE DEVIATION', 'HORIZ_G .575')
    min: (1.38, 'VOLTAGE DEVIATION', 'HORIZ_G .575')
MEDBOWCO 115:
    max: (1.38, 'VOLTAGE DEVIATION', 'MEDBOWCO 115')
    min: (1.38, 'VOLTAGE DEVIATION', 'MEDBOWCO 115')
MINERS  34.5:
    max: (1.39, 'VOLTAGE DEVIATION', 'MINERS  34.5')
    min: (1.3867, 'VOLTAGE DEVIATION', 'MINERS  34.5')
STNDPSVC 230:
    max: (1.38, 'VOLTAGE DEVIATION', 'STNDPSVC 230')
    min: (1.38, 'VOLTAGE DEVIATION', 'STNDPSVC 230')

根据你的评论,听起来你的文字看起来像元组。因此,要将其转换为实际元组:

import ast

tups=[ast.literal_eval(t) for t in tups]
tups=[(float(a),b,c) for a,b,c in tups]

答案 3 :(得分:0)

如果您只有少数元组,这可能有点过分,但如果您有一长串列表并且能够使用外部库,那么请查看pandas。假设包含元组的变量为tuplelist,则下面给出了您想要的输出:

import pandas
df = pandas.DataFrame.from_records(tuplelist)
df = pandas.concat([df.groupby([1, 2]).min(), 
                df.groupby([1, 2]).max() ])
df = df.sort().reset_index().reindex(columns = [0,1,2])
print list(tuple(x) for x in df.values)