将单级DataFrame与3级DataFrame合并

时间:2019-01-07 14:28:39

标签: python python-3.x pandas

单级DataFrame:

   data1 = {'Sr.No.': Sr_no,
     'CompanyNames': Company_Names,
     'YourChoice1': Your_Choice,
     'YourChoice2': Your_Choice}

   df1 = pd.DataFrame(data1, columns = pd.Index(['Sr.No.', 'CompanyNames','YourChoice1','YourChoice2'], name='key'))

csv文件中单级数据帧的输出:

enter image description here

3级数据框:

   form = {'I1': {'F1': {'PD': ['1','2','3','4','5','6','7','8','9'],
                   'CD': ['1','2','3','4','5','6','7','8','9']},

            'F2': {'PD': ['1','2','3','4','5','6','7','8','9'],
                   'CD': ['1','2','3','4','5','6','7','8','9']},

            'F3': {'PD': ['1','2','3','4','5','6','7','8','9'],
                   'CD': ['1','2','3','4','5','6','7','8','9']}
            },


     'I2': {'F1': {'PD': ['1','2','3','4','5','6','7','8','9'],
                   'CD': ['1','2','3','4','5','6','7','8','9']},

            'F2': {'PD': ['1','2','3','4','5','6','7','8','9'],
                   'CD': ['1','2','3','4','5','6','7','8','9']}
            }
     }

   headers,values,data = CSV_trial.DATA(form)
   cols = pd.MultiIndex.from_tuples(headers, names=['ind','field','data'])
   df2 = pd.DataFrame(data, columns=cols)

csv文件中三级数据帧的输出: enter image description here

我想将这些数据帧合并为左侧的df1和右侧的df2 ...

所需的输出: enter image description here 谁能帮我这个忙吗?

1 个答案:

答案 0 :(得分:0)

一种简单的方法是将单层df转换为3级,然后合并具有相同结构的两个df。

导入必要的软件包:

import pandas as pd
import numpy as np

创建本地3级索引。您可以从csv,xml等中读取它。

native_lvl_3_index_tup = [('A','foo1', 1), ('A','foo2', 3),
     ('B','foo1', 1), ('B','foo2', 3),
     ('C','foo1', 1), ('C','foo2', 3)]

variables = [33871648, 37253956,
           18976457, 19378102,
           20851820, 25145561]

native_lvl_3_index = pd.MultiIndex.from_tuples(native_lvl_3_index_tup)

函数,将本机单级索引转换为3级

def single_to_3_lvl(single_index_list,val_lvl_0,val_lvl_1):
    multiindex_tuple = [(val_lvl_0,val_lvl_1,i) for i in single_index_list]
    return pd.MultiIndex.from_tuples(multiindex_tuple)

使用此函数可获得人为的3级索引:

single_index = [1,2,3,4,5,6]
artificial_multiindex = single_to_3_lvl(single_index,'A','B')

创建数据框,进行转置以将多索引移到列中(如问题所示):

df1 = pd.DataFrame(variables,artificial_multiindex).T 
df2 = pd.DataFrame(variables,native_lvl_3_index).T

我在数据框中使用了相同的变量。您可以通过在join='outer' or 'inner'

中设置pd.concat()来操纵串联
result = pd.concat([df1,df2],axis = 1)

变量结果包含串联的数据框。如果您具有单级索引数据帧,则可以对其重新索引:

single_level_df = pd.DataFrame(single_index,variables)
reindexed = single_level_df.reindex(artificial_multiindex).T 

同样,我确实转置(.T)来处理列。创建数据框时可以对其进行不同的设置。

希望我的回答有所帮助。 我使用了链接中的一些代码:https://jakevdp.github.io/PythonDataScienceHandbook/03.05-hierarchical-indexing.html