我需要根据引用字典重命名和重复我的dataframe列。下面我创建了一个虚拟数据帧:
rawdata= {'id':['json','molly','tina','jake','molly'],'entity':['present','absent','absent','present','present'],'entity2':['present','present','present','absent','absent'],'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
entity entity2 entity3
id
json present present absent
molly absent present absent
tina absent present absent
jake present absent present
molly present absent absent
现在我有以下示例dict:
ref_dict= {'entity':['entity_exp1'],'entity2':['entity2_exp1','entity2_exp2'],'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}
我现在需要根据dict值替换列名,如果列有多个值,则列应该重复。以下是我想要的数据框:
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id
json present present present absent absent absent
molly absent present present absent absent absent
tina absent present present absent absent absent
jake present absent absent present present present
molly present absent absent absent absent absent
答案 0 :(得分:1)
选项1
在词典理解中使用pd.concat
pd.concat({k: df[v] for v, l in ref_dict.items() for k in l}, axis=1)
entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3 entity_exp1
id
json present present absent absent absent present
molly present present absent absent absent absent
tina present present absent absent absent absent
jake absent absent present present present present
molly absent absent absent absent absent present
选项2
切片数据框并重命名列
repeats = df.columns.map(lambda x: len(ref_dict[x]))
d1 = df.reindex_axis(df.columns.repeat(repeats), 1)
d1.columns = df.columns.map(ref_dict.get).values.sum()
d1
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id
json present present present absent absent absent
molly absent present present absent absent absent
tina absent present present absent absent absent
jake present absent absent present present present
molly present absent absent absent absent absent
答案 1 :(得分:0)
对于df
中的每一列,您可以在ref_dict
中查找新列的数量并为其创建new column
,最后删除旧列。您可以尝试以下操作:
# for key, value in ref_dict where old column and new columns are
for old_column,new_columns in ref_dict.items():
for new_column in new_columns: # for each new_column in new_columns defined
df[new_column] = df[old_column] # the content remains same as old column
del df[old_column] # now remove the old column
答案 2 :(得分:0)
你可以简单地循环:
rawdata= {'id':['json','molly','tina','jake','molly'],
'entity':['present','absent','absent','present','present'],
'entity2':['present','present','present','absent','absent'],
'entity3':['absent','absent','absent','present','absent']}
df= pd.DataFrame(rawdata)
df.set_index('id')
ref_dict= {'entity':['entity_exp1'],
'entity2':['entity2_exp1','entity2_exp2'],
'entity3':['entity3_exp1','entity3_exp2','entity3_exp3']}
# here comes the new part:
df2 = pd.DataFrame()
for key, val in sorted(ref_dict.items()):
for subval in val:
df2[subval] = df[key]
df2['id'] = df['id']
df2.set_index('id', inplace=True)
print(df2)
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
id
json present present present absent absent absent
molly absent present present absent absent absent
tina absent present present absent absent absent
jake present absent absent present present present
molly present absent absent absent absent absent
答案 3 :(得分:0)
您可以使用dict键作为列名重新索引df,然后使用dict的值重命名列。
df_new = df.reindex(columns=sum([[k]*len(v) for k,v in ref_dict.items()],[]))
df_new.columns=sum(ref_dict.values(),[])
df_new
Out[573]:
entity_exp1 entity2_exp1 entity2_exp2 entity3_exp1 entity3_exp2 entity3_exp3
0 present present present absent absent absent
1 absent present present absent absent absent
2 absent present present absent absent absent
3 present absent absent present present present
4 present absent absent absent absent absent