我有一个对象列表,我想要"压缩"基于匹配属性(id
)和可选类参数的较小对象列表。
class Case:
def __init__(self, id, formtype, age, fever=None, cough=None, gender=None):
self.case_id = case_id
self.form_type = formtype
self.age = age
self.fever = fever
self.cough = cough
self.gender = gender
caselist = [
Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
Case(id="12345", formtype="B", age=12, cough=0),
Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
Case(id="67890", formtype="B", age=34, cough=1),
Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]
如何获得看起来像这样的新列表?它应该选择formtype="B"
而不是formtype="A"
。
compressed = [
Case("12345", "B", 12, 1, 1, "female"),
Case("67890", "B", 34, 0, 1, "male"),
Case("75321", "A", 2, 0, "male")
]
我尝试用dict压缩它而没有运气:
compressed = [Case(id=case.id, formtype=None, age=case.age) for event in caselist if case.formtype == 'A']
答案 0 :(得分:2)
按ID分组并保留具有" B"的对象。 form_type表示重复ID,其中包含" B" formtype或者只是保持原样,如果你想使用未在" B中设置的任何属性,你可以使用getattr和setattr迭代属性来设置B中任何以前未设置的属性,你不能硬编码要设置的内容或什么不设置,除非事先知道A中设置的内容和/或B中设置的内容:
class Case:
def __init__(self, id, formtype, age, fever=None, cough=None, gender=None):
self.case_id = id
self.form_type = formtype
self.age = age
self.fever = fever
self.cough = cough
self.gender = gender
def __iter__(self):
for ele in ["case_id", "form_type", "age",
"fever", "cough", "gender"]:
yield ele
caselist = [
Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
Case(id="12345", formtype="B", age=12, cough=0),
Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
Case(id="67890", formtype="B", age=34, cough=1),
Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]
d = {}
for c in caselist:
if c.case_id not in d:
d[c.case_id] = c
elif d[c.case_id].form_type != "B" and c.form_type == "B":
tmp = d[c.case_id]
for attr in c:
if getattr(c, attr) is None:
setattr(c, attr, getattr(tmp, attr))
d[c.case_id] = c
caselist[:] = d.values()
print(caselist)
答案 1 :(得分:0)
这比你要去的地方要长一点但是这个有效。它会创建A
表单和B
表单的单独列表。然后它遍历B
表单并查找匹配的A
表单。如果找到匹配项,则其更改会将所有A
值添加到B
表单
def merge(acases, bcases):
newlist = []
for b in bcases:
for a in acases[:]:
if b.id == a.id:
if not b.cough:
b.cough = a.cough
if not b.fever:
b.fever = a.fever
if not b.gender:
b.gender = a.gender
newlist.append(b)
acases.remove(a)
newlist += acases
return newlist
caselist = [
Case(id="12345", formtype="A", age=12, fever=1, gender="female"),
Case(id="12345", formtype="B", age=12, cough=0),
Case(id="67890", formtype="A", age=34, fever=0, gender="male"),
Case(id="67890", formtype="B", age=34, cough=1),
Case(id="75321", formtype="A", age=2, fever=0, gender="male")
]
acases = [case for case in caselist if case.formtype == 'A']
bcases = [case for case in caselist if case.formtype == 'B']
caselist = merge(acases, bcases)
for i in caselist:
print '{0} {1} {2} {3} {4} {5}'.format(i.id, i.formtype, i.age, i.cough, i.fever, i.gender)
12345 B 12 0 1 female
67890 B 34 1 0 male
75321 A 2 None 0 male
这是另一种方法,它比我之前的答案更有效但不如@LeartS的答案效率高。这两个答案都可以处理不同的表单布局
def check_val(av, bv):
if not bv:
return av
return bv
caselist = [
Case(case_id="12345", form_type="A", age=12, cough = 0, gender="female"),
Case(case_id="12345", form_type="B", age=12, fever=10),
Case(case_id="67890", form_type="A", age=34, fever=0, gender="male"),
Case(case_id="67890", form_type="B", age=34, cough=1),
Case(case_id="75321", form_type="A", age=2, fever=0, gender="male")
]
d={}
caselist.sort(key=lambda x: x.form_type, reverse=True)
for case in caselist:
if case.case_id not in d and case.form_type == 'B':
d[case.case_id] = case
if case.form_type == 'A' and case.case_id in d:
b = d[case.case_id]
b.cough = check_val(case.cough, b.cough)
b.fever = check_val(case.fever, b.fever)
b.gender = check_val(case.gender, b.gender)
else:
d[case.case_id] = case
答案 2 :(得分:-1)
有时我认为琐碎的明确方法也是最好的方法,我只想这样做:
__str__
使用您的输入,给出输出(在为Case类定义In [1]: [str(c) for c in cases]
Out[1]:
['case_id: 67890, form_type: B, age: 34, fever: 0, cough: 1, gender: male',
'case_id: 12345, form_type: B, age: 12, fever: 1, cough: 0, gender: female',
'case_id: 75321, form_type: A, age: 2, fever: 0, cough: None, gender: male']
函数之后):
None
请注意id 75321如何咳嗽caselist
而不是0,我认为这样更好,因为你没有关于该id的cough参数的任何信息。 (同样对于id 12345,正确的咳嗽参数是0,而不是1.我认为它在你的示例输出中是一个错字)
它还只迭代原始{{1}}一次并使用字典进行O(1)id查找