与新观察结果连接-DataFrame

时间:2018-07-19 21:10:50

标签: python pandas dataframe concatenation

我正在尝试将新的观察结果串联起来。我得到的答案是我认为是正确的,但仍然让系统回到我身边说:“ ValueError 只能比较标记相同的DataFrame对象”,谁能告诉我为什么我认为我得到了正确的结果,为什么会有值错误?

这是问题:

假设数据框Employee如下:

      Department      Title  Year Education Sex
Name                                           
Bob           IT    analyst     1  Bachelor   M
Sam        Trade  associate     3       PHD   M
Peter         HR         VP     8    Master   M
Jake          IT    analyst     2    Master   M

和另一个数据框new_observations是:

         Department Education Sex      Title  Year
Mary             IT             F         VP   9.0
Amy               ?       PHD   F  associate   5.0
Jennifer      Trade    Master   F  associate   NaN
John             HR    Master   M    analyst   2.0
Judy             HR  Bachelor   F    analyst   2.0

使用这些新观察结果更新Employee。

这是我的代码:

    import pandas as pd
    Employee =pd.DataFrame({"Name":["Bob","Sam","Peter","Jake"],
                    "Education":["Bachelor","PHD","Master","Master"],
                   "Sex":["M","M","M","M"],
                    "Year":[1,3,8,2],
                   "Department":["IT","Trade","HR","IT"],
              "Title":["analyst", "associate", "VP", "analyst"]})

    Employee=Employee.set_index('Name')

    new_observations = pd.DataFrame({
               "Name": ["Mary","Amy","Jennifer","John","Judy"],
               "Department":["IT","?","Trade","HR","HR"],
               "Education":["","PHD","Master","Master","Bachelor"],
               "Sex":["F","F","F","M","F"],
               "Title":["VP","associate","associate","analyst","analyst"],
               "Year":[9.0,5.0,"NaN",2.0,2.0]},
               columns= 
               ["Name","Department","Education","Sex","Title","Year"])

    new_observations=new_observations.set_index('Name')

    Employee = Employee.append(new_observations,sort=False)

这是我的结果:

code result

我也尝试过

Employee = pd.concat([Employee, new_observations], axis = 1, sort=False)

1 个答案:

答案 0 :(得分:0)

pd.concat上使用axis=0,这是默认设置,因此您不需要包括轴:

pd.concat([Employee, new_observations], sort=False)

输出:

         Education Sex Year Department      Title
Name                                             
Bob       Bachelor   M    1         IT    analyst
Sam            PHD   M    3      Trade  associate
Peter       Master   M    8         HR         VP
Jake        Master   M    2         IT    analyst
Mary                 F    9         IT         VP
Amy            PHD   F    5          ?  associate
Jennifer    Master   F  NaN      Trade  associate
John        Master   M    2         HR    analyst
Judy      Bachelor   F    2         HR    analyst