熊猫:比较数据框的列并根据条件添加新的列和值

时间:2019-11-12 12:07:23

标签: python python-3.x pandas

我有一个数据框,

ip_df:
     name class    sec    details
0    tom  I        a      [{'class':'I','sec':'a','subjects':['numbers','ethics']},{'class':'I','sec':'b','subjects':['numbers','moral-science']},{'class':'I','sec':'c','subjects':['moral-science','ethics']},{'class':'I','subjects':['numbers','ethics1']}]
1    sam  I        d      [{'class':'I','sec':'a','subjects':['numbers','ethics']},{'class':'I','sec':'b','subjects':['numbers','moral-science']},{'class':'I','sec':'c','subjects':['moral-science','ethics']},{'class':'I','subjects':['numbers','ethics1']}] 

,结果数据帧应该是

op_df:
      name  class  sec   subjects
0     tom   I      a     ['numbers','ethics']
1     sam   I      d     ['numbers','ethics1']

“ op_df”必须根据以下条件进行构图,

  • 条件1:检查“详细信息”列中是否存在“类”和“秒”,如果存在,则添加一个新列,其名称分别为“主题”
  • 条件2:如果“详细信息”列中不存在“类别”和“秒”,请检查“类别”是否匹配,如果是,则添加一个新列,其名称分别为“主题”
  • 如果条件1和条件2都不存在,请在“主题”列中将默认值添加为[0,0]

1 个答案:

答案 0 :(得分:2)

如果两个条件都需要首先匹配值的解决方案,请使用nextiter技巧,如果不匹配,则添加默认值[0, 0]

final = []
for a, b, c in zip(df['class'], df['sec'], df['details']):
    out = []
    for x in c:
        m1 = x['class'] == a 
        if m1 and x.get('sec') == b:
            out.append(x['subjects'])
        elif m1 and 'sec' not in list(x.keys()):
            out.append(x['subjects'])
    final.append(next(iter(out), [0,0]))

df['subjects'] =  final