根据Python

时间:2016-01-15 14:55:35

标签: python python-2.7

我有一个对象列表,我想要"压缩"基于匹配属性(id)和可选类参数的较小对象列表。

class Case:
    def __init__(self, id, formtype, age, fever=None, cough=None, gender=None):
        self.case_id = case_id
        self.form_type = formtype
        self.age = age
        self.fever = fever
        self.cough = cough
        self.gender = gender

caselist = [
    Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
    Case(id="12345", formtype="B", age=12, cough=0),
    Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
    Case(id="67890", formtype="B", age=34, cough=1),
    Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]

如何获得看起来像这样的新列表?它应该选择formtype="B"而不是formtype="A"

compressed = [
    Case("12345", "B", 12, 1, 1, "female"),
    Case("67890", "B", 34, 0, 1, "male"),
    Case("75321", "A", 2, 0, "male")
]

我尝试用dict压缩它而没有运气:

compressed = [Case(id=case.id, formtype=None, age=case.age) for event in caselist if case.formtype == 'A']

3 个答案:

答案 0 :(得分:2)

按ID分组并保留具有" B"的对象。 form_type表示重复ID,其中包含" B" formtype或者只是保持原样,如果你想使用未在" B中设置的任何属性,你可以使用getattr和setattr迭代属性来设置B中任何以前未设置的属性,你不能硬编码要设置的内容或什么不设置,除非事先知道A中设置的内容和/或B中设置的内容:

class Case:
    def __init__(self, id, formtype, age, fever=None, cough=None, gender=None):
        self.case_id = id
        self.form_type = formtype
        self.age = age
        self.fever = fever
        self.cough = cough
        self.gender = gender

    def __iter__(self):
        for ele in ["case_id", "form_type", "age",
                    "fever", "cough", "gender"]:
            yield ele


caselist = [
    Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
    Case(id="12345", formtype="B", age=12, cough=0),
    Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
    Case(id="67890", formtype="B", age=34, cough=1),
    Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]

d = {}

for c in caselist:
    if c.case_id not in d:
        d[c.case_id] = c
    elif d[c.case_id].form_type != "B" and c.form_type == "B":
        tmp = d[c.case_id]
        for attr in c:
            if getattr(c, attr) is None:
                setattr(c, attr, getattr(tmp, attr))
        d[c.case_id] = c

caselist[:] = d.values()
print(caselist)

答案 1 :(得分:0)

这比你要去的地方要长一点但是这个有效。它会创建A表单和B表单的单独列表。然后它遍历B表单并查找匹配的A表单。如果找到匹配项,则其更改会将所有A值添加到B表单

def merge(acases, bcases):
    newlist = []
    for b in bcases:
        for a in acases[:]:
            if b.id == a.id:
                if not b.cough:
                    b.cough = a.cough
                if not b.fever:
                    b.fever = a.fever
                if not b.gender:
                    b.gender = a.gender
                newlist.append(b)
                acases.remove(a)
    newlist += acases
    return newlist


caselist = [
    Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
    Case(id="12345", formtype="B", age=12, cough=0),
    Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
    Case(id="67890", formtype="B", age=34, cough=1),
    Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]

acases = [case for case in caselist if case.formtype == 'A']
bcases = [case for case in caselist if case.formtype == 'B']

caselist = merge(acases, bcases)

for i in caselist:
    print '{0} {1} {2} {3} {4} {5}'.format(i.id, i.formtype, i.age, i.cough, i.fever, i.gender)

12345 B 12 0 1 female
67890 B 34 1 0 male
75321 A 2 None 0 male

这是另一种方法,它比我之前的答案更有效但不如@LeartS的答案效率高。这两个答案都可以处理不同的表单布局

def check_val(av, bv):
    if not bv:
        return av
    return bv

caselist = [
    Case(case_id="12345", form_type="A", age=12, cough = 0, gender="female"),
    Case(case_id="12345", form_type="B", age=12, fever=10),
    Case(case_id="67890", form_type="A", age=34, fever=0, gender="male"),
    Case(case_id="67890", form_type="B", age=34, cough=1),
    Case(case_id="75321", form_type="A", age=2, fever=0, gender="male")
]

d={}
caselist.sort(key=lambda x: x.form_type, reverse=True)

for case in caselist:
    if case.case_id not in d and case.form_type == 'B':
        d[case.case_id] = case

    if case.form_type == 'A' and case.case_id in d:
        b = d[case.case_id]
        b.cough = check_val(case.cough, b.cough)
        b.fever = check_val(case.fever, b.fever)
        b.gender = check_val(case.gender, b.gender)
    else:
        d[case.case_id] = case

答案 2 :(得分:-1)

有时我认为琐碎的明确方法也是最好的方法,我只想这样做:

__str__

使用您的输入,给出输出(在为Case类定义In [1]: [str(c) for c in cases] Out[1]: ['case_id: 67890, form_type: B, age: 34, fever: 0, cough: 1, gender: male', 'case_id: 12345, form_type: B, age: 12, fever: 1, cough: 0, gender: female', 'case_id: 75321, form_type: A, age: 2, fever: 0, cough: None, gender: male'] 函数之后):

None

请注意id 75321如何咳嗽caselist而不是0,我认为这样更好,因为你没有关于该id的cough参数的任何信息。 (同样对于id 12345,正确的咳嗽参数是0,而不是1.我认为它在你的示例输出中是一个错字)

它还只迭代原始{{1}}一次并使用字典进行O(1)id查找