Pandas-根据参考字典重复数据帧列

时间:2017-06-22 21:00:27

标签: python pandas dataframe

我需要根据引用字典重命名和重复我的dataframe列。下面我创建了一个虚拟数据帧:

rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')

        entity  entity2  entity3
id                              
json   present  present   absent
molly   absent  present   absent
tina    absent  present   absent
jake   present   absent  present
molly  present   absent   absent

现在我有以下示例dict:

ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

我现在需要根据dict值替换列名,如果列有多个值,则列应该重复。以下是我想要的数据框:

       entity_exp1  entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                      
json    present      present      present      absent      absent    absent
molly   absent       present      present      absent      absent    absent
tina    absent       present      present      absent      absent    absent
jake    present      absent       absent       present     present   present
molly   present      absent       absent       absent      absent    absent

4 个答案:

答案 0 :(得分:1)

选项1
在词典理解中使用pd.concat

pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1)

      entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1
id                                                                                
json       present      present       absent       absent       absent     present
molly      present      present       absent       absent       absent      absent
tina       present      present       absent       absent       absent      absent
jake        absent       absent      present      present      present     present
molly       absent       absent       absent       absent       absent     present

选项2
切片数据框并重命名列

repeats = df.columns.map(lambda x: len(ref_dict[x]))
d1 = df.reindex_axis(df.columns.repeat(repeats), 1)
d1.columns = df.columns.map(ref_dict.get).values.sum()
d1

      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id                                                                                
json      present      present      present       absent       absent       absent
molly      absent      present      present       absent       absent       absent
tina       absent      present      present       absent       absent       absent
jake      present       absent       absent      present      present      present
molly     present       absent       absent       absent       absent       absent

答案 1 :(得分:0)

对于df中的每一列,您可以在ref_dict中查找新列的数量并为其创建new column,最后删除旧列。您可以尝试以下操作:

# for key, value in ref_dict where old column and new columns are 
for old_column,new_columns in ref_dict.items():
    for new_column in new_columns:  # for each new_column in new_columns defined
        df[new_column] = df[old_column] # the content remains same as old column
    del df[old_column]  # now remove the old column

答案 2 :(得分:0)

你可以简单地循环:

rawdata= {'id':['json','molly','tina','jake','molly'],
          'entity':['present','absent','absent','present','present'],
          'entity2':['present','present','present','absent','absent'],
          'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
ref_dict= {'entity':['entity_exp1'],
           'entity2':['entity2_exp1','entity2_exp2'],
           'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}

# here comes the new part:
df2 = pd.DataFrame()
for key, val in sorted(ref_dict.items()):
    for subval in val:
        df2[subval] = df[key]

df2['id'] = df['id']
df2.set_index('id', inplace=True)

print(df2)
      entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2  entity3_exp3  
id                                                                      
json      present      present      present       absent       absent        absent   
molly      absent      present      present       absent       absent        absent   
tina       absent      present      present       absent       absent        absent   
jake      present       absent       absent      present      present       present    
molly     present       absent       absent       absent       absent        absent   

答案 3 :(得分:0)

您可以使用dict键作为列名重新索引df,然后使用dict的值重命名列。

df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[]))
df_new.columns=sum(ref_dict.values(),[])
df_new
Out[573]: 
  entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
0     present      present      present       absent       absent       absent
1      absent      present      present       absent       absent       absent
2      absent      present      present       absent       absent       absent
3     present       absent       absent      present      present      present
4     present       absent       absent       absent       absent       absent