单级DataFrame:
data1 = {'Sr.No.': Sr_no,
'CompanyNames': Company_Names,
'YourChoice1': Your_Choice,
'YourChoice2': Your_Choice}
df1 = pd.DataFrame(data1, columns = pd.Index(['Sr.No.', 'CompanyNames','YourChoice1','YourChoice2'], name='key'))
csv文件中单级数据帧的输出:
3级数据框:
form = {'I1': {'F1': {'PD': ['1','2','3','4','5','6','7','8','9'],
'CD': ['1','2','3','4','5','6','7','8','9']},
'F2': {'PD': ['1','2','3','4','5','6','7','8','9'],
'CD': ['1','2','3','4','5','6','7','8','9']},
'F3': {'PD': ['1','2','3','4','5','6','7','8','9'],
'CD': ['1','2','3','4','5','6','7','8','9']}
},
'I2': {'F1': {'PD': ['1','2','3','4','5','6','7','8','9'],
'CD': ['1','2','3','4','5','6','7','8','9']},
'F2': {'PD': ['1','2','3','4','5','6','7','8','9'],
'CD': ['1','2','3','4','5','6','7','8','9']}
}
}
headers,values,data = CSV_trial.DATA(form)
cols = pd.MultiIndex.from_tuples(headers, names=['ind','field','data'])
df2 = pd.DataFrame(data, columns=cols)
我想将这些数据帧合并为左侧的df1和右侧的df2 ...
答案 0 :(得分:0)
一种简单的方法是将单层df转换为3级,然后合并具有相同结构的两个df。
导入必要的软件包:
import pandas as pd
import numpy as np
创建本地3级索引。您可以从csv,xml等中读取它。
native_lvl_3_index_tup = [('A','foo1', 1), ('A','foo2', 3),
('B','foo1', 1), ('B','foo2', 3),
('C','foo1', 1), ('C','foo2', 3)]
variables = [33871648, 37253956,
18976457, 19378102,
20851820, 25145561]
native_lvl_3_index = pd.MultiIndex.from_tuples(native_lvl_3_index_tup)
函数,将本机单级索引转换为3级:
def single_to_3_lvl(single_index_list,val_lvl_0,val_lvl_1):
multiindex_tuple = [(val_lvl_0,val_lvl_1,i) for i in single_index_list]
return pd.MultiIndex.from_tuples(multiindex_tuple)
使用此函数可获得人为的3级索引:
single_index = [1,2,3,4,5,6]
artificial_multiindex = single_to_3_lvl(single_index,'A','B')
创建数据框,进行转置以将多索引移到列中(如问题所示):
df1 = pd.DataFrame(variables,artificial_multiindex).T
df2 = pd.DataFrame(variables,native_lvl_3_index).T
我在数据框中使用了相同的变量。您可以通过在join='outer' or 'inner'
pd.concat()
来操纵串联
result = pd.concat([df1,df2],axis = 1)
变量结果包含串联的数据框。如果您具有单级索引数据帧,则可以对其重新索引:
single_level_df = pd.DataFrame(single_index,variables)
reindexed = single_level_df.reindex(artificial_multiindex).T
同样,我确实转置(.T)来处理列。创建数据框时可以对其进行不同的设置。
希望我的回答有所帮助。 我使用了链接中的一些代码:https://jakevdp.github.io/PythonDataScienceHandbook/03.05-hierarchical-indexing.html