
时间:2015-03-03 01:06:07

标签: python pandas


print df1

              A_1               A_2               A_3                .....
              Value_V  Value_y  Value_V  Value_y  Value_V  Value_y

instance200   50       0        6500     1        50       0
instance201   100      0        6400     1        50       0


print df2

              PV         Estimate

instance200   2002313    1231233
instance201   2134124    1124724


             PV        Estimate   A_1               A_2               A_3                .....
                                  Value_V  Value_y  Value_V  Value_y  Value_V  Value_y

instance200  2002313   1231233    50       0        6500     1        50       0
instance201  2134124   1124724    100      0        6400     1        50       0


             PV        Estimate   (A_1,Value_V) (A_1,Value_y) (A_2,Value_V) (A_2,Value_y)  .....

instance200  2002313   1231233    50             0             6500         1
instance201  2134124   1124724    100            0             6400         1 


4 个答案:

答案 0 :(得分:6)


df3 = df1.copy()
df3[df2.columns] = df2


                A_1             A_2             A_3               PV Estimate
            Value_V Value_y Value_V Value_y Value_V Value_y                  
instance200      50       0    6500       1      50       0  2002313  1231233
instance201     100       0    6400       1      50       0  2134124  1124724

答案 1 :(得分:3)


In [11]: df1
                A_1             A_2             A_3
            Value_V Value_y Value_V Value_y Value_V Value_y
instance200      50       0    6500       1      50       0
instance201     100       0    6400       1      50       0

In [12]: df2
                  PV  Estimate
instance200  2002313   1231233
instance201  2134124   1124724

In [13]: df2.columns = pd.MultiIndex.from_arrays([df2.columns, [None] * len(df2.columns)])

In [14]: df2
                  PV Estimate
                 NaN      NaN
instance200  2002313  1231233
instance201  2134124  1124724


In [15]: pd.concat([df1, df2], axis=1)
                A_1             A_2             A_3               PV Estimate
            Value_V Value_y Value_V Value_y Value_V Value_y      NaN      NaN
instance200      50       0    6500       1      50       0  2002313  1231233
instance201     100       0    6400       1      50       0  2134124  1124724

注意:要让df2列首先使用pd.concat([df2, df1], axis=1)


答案 2 :(得分:0)


def concat( df1, df2 ):

  Function concatenates two dataframes df1 and df2 even if the two datafames 
  have different number of hierarchical columns levels. 

  In the case of one dataframe having more hierarchical columns levels than the
  other, blank string will be added to the upper hierarchical columns levels

  nLevels1 = df1.columns.nlevels
  nLevels2 = df2.columns.nlevels
  diff     = nLevels2 - nLevels1


  if nLevels1 == nLevels2:
    # if the same simply concat as normal
    return pd.concat( [df1, df2 ], axis = 1 )

  elif nLevels1 < nLevels2:
    # if there is a difference expand smaller dataframe with blank strings, then concat

    a = [[""] * len( df1.columns )] * np.abs(diff)
    a.append( df1.columns.to_list() )
    df1.columns = a

    return pd.concat( [df1, df2 ], axis = 1 )

  elif nLevels1 > nLevels2:
    # if there is a difference expand smaller dataframe with blank strings, then concat

    a = [[""] * len( df2.columns )] * np.abs(diff)
    a.append( df2.columns.to_list() )
    df1.columns = a

    return pd.concat( [df1, df2 ], axis = 1 )


gender  f  m
n       2  1
y       2  2

gender        f                         m             
age         old        young          old        young
location london paris london paris london paris london
n             1     0      1     0      0     1      0
y             0     1      0     1      1     0      1


             f                         m                   
            old        young          old        young      
         london paris london paris london paris london  f  m
n             1     0      1     0      0     1      0  2  1
y             0     1      0     1      1     0      1  2  2


答案 3 :(得分:0)

我为 pandas.concat 函数制作了一个包装器,它接受级别数不等的数据帧。

空层是从下面添加的。优点是它允许使用 df_cols.c 访问系列(在下面的 df_cols 中),并且在打印时,明确 'c' 不是 {{ 1}}。

('CC', 'one')



def concat(dfs, axis=0, *args, **kwargs):   
    Wrapper for `pandas.concat'; concatenate pandas objects even if they have 
    unequal number of levels on concatenation axis.
    Levels containing empty strings are added from below (when concatenating along
    columns) or right (when concateniting along rows) to match the maximum number 
    found in the dataframes.
    dfs : Iterable
        Dataframes that must be concatenated.
    axis : int, optional
        Axis along which concatenation must take place. The default is 0.

        Concatenated Dataframe.
    Any arguments and kwarguments are passed onto the `pandas.concat` function.
    See also
    def index(df):
        return df.columns if axis==1 else df.index
    def add_levels(df):
        need = want - index(df).nlevels
        if need > 0:
            df = pd.concat([df], keys=[('',)*need], axis=axis) # prepend empty levels
            for i in range(want-need): # move empty levels to bottom
                df = df.swaplevel(i, i+need, axis=axis) 
        return df
    want = np.max([index(df).nlevels for df in dfs])    
    dfs = [add_levels(df) for df in dfs]
    return pd.concat(dfs, axis=axis, *args, **kwargs)