根据Second DataFrame中的值替换DataFrame中的Nan

时间:2018-04-12 04:56:12

标签: pandas dataframe

似乎有很多DF问题来自另一个DF的条件,但我找不到任何我需要的东西。两个数据帧都是小样本。它们每列都有数千列。我有一个看起来像这样的DataFrame(df1)

             IBM    BA      CAT     IBM EARN    BA EARN   CAT EARN
Date
1/22/2018   163.13  65.94   76.50     NaN        NaN       NaN
1/23/2018   163.17  65.94   76.51     NaN        NaN       NaN
1/24/2018   167.26  67.43   79.23     NaN        NaN       NaN
1/25/2018   166.28  67.77   80.57     NaN        NaN       NaN
1/26/2018   166.58  68.37   80.87     NaN        NaN       NaN
1/27/2018   166.77  68.87   81.07     NaN        NaN       NaN
1/28/2018   167.98  68.57   81.07     NaN        NaN       NaN
2/1/2018    167.98  68.77   81.59     NaN        NaN       NaN
2/2/2018    167.98  69.07   81.87     NaN        NaN       NaN

我有另一个数据帧(df2),其列数与df1中的最后三列相同但具有特定日期

    IBM EARN    BA EARN     CAT EARN
0   1/22/2018   2/1/2018    1/26/2018
1   10/19/2017  10/26/2017  10/25/2017
2   7/20/2017   7/27/2017   7/26/2017
3   4/20/2017   4/27/2017   4/26/2017
4   1/23/2017   1/26/2017   1/27/2017
5   10/19/2016  10/27/2016  10/26/2016
6   7/20/2016   7/28/2016   7/27/2016

我想在df1中放置1,其中df2中有相应的日期。所以(部分)结果看起来像这样,但会继续df2中的所有日期列表。

             IBM     BA     CAT     IBM EARN    BA EARN   CAT EARN
Date
1/22/2018   163.13  65.94   76.50    **1**       NaN       NaN
1/23/2018   163.17  65.94   76.51     NaN        NaN       NaN
1/24/2018   167.26  67.43   79.23     NaN        NaN       NaN
1/25/2018   166.28  67.77   80.57     NaN        NaN       NaN
1/26/2018   166.58  68.37   80.87     NaN        NaN        **1**
1/27/2018   166.77  68.87   81.07     NaN        NaN       NaN
1/28/2018   167.98  68.57   81.07     NaN        NaN       NaN
2/1/2018    167.98  68.77   81.59     NaN        **1**     NaN
2/2/2018    167.98  69.07   81.87     NaN        NaN       NaN

如果您能提供解决方案,请告诉我

2 个答案:

答案 0 :(得分:2)

对于DaatFrame的每个第二列for col in df2.columns: df1[col] = np.where(df1.index.isin(df2[col]),1,np.nan) print (df1) IBM BA CAT IBM EARN BA EARN CAT EARN Date 1/22/2018 163.13 65.94 76.50 1.0 NaN NaN 1/23/2018 163.17 65.94 76.51 NaN NaN NaN 1/24/2018 167.26 67.43 79.23 NaN NaN NaN 1/25/2018 166.28 67.77 80.57 NaN NaN NaN 1/26/2018 166.58 68.37 80.87 NaN NaN 1.0 1/27/2018 166.77 68.87 81.07 NaN NaN NaN 1/28/2018 167.98 68.57 81.07 NaN NaN NaN 2/1/2018 167.98 68.77 81.59 NaN 1.0 NaN 2/2/2018 167.98 69.07 81.87 NaN NaN NaN 检查成员资格,Index.isinnumpy.where替换值:

df2

编辑:

DataFrame.isin的非循环解决方案,由#first create DataFrame by repeat index of df1 #https://stackoverflow.com/a/45118399 arr = np.broadcast_to(df1.index[:, None], (len(df1), len(df2.columns))) df3 = pd.DataFrame(arr, columns=df2.columns, index=df1.index) df3 = df3.isin(df2.to_dict('l')).astype(int) print (df3) IBM EARN BA EARN CAT EARN Date 1/22/2018 1 0 0 1/23/2018 0 0 0 1/24/2018 0 0 0 1/25/2018 0 0 0 1/26/2018 0 0 1 1/27/2018 0 0 0 1/28/2018 0 0 0 2/1/2018 0 1 0 2/2/2018 0 0 0 df1 = df1.drop(df2.columns, 1).join(df3) print (df1) IBM BA CAT IBM EARN BA EARN CAT EARN Date 1/22/2018 163.13 65.94 76.50 1 0 0 1/23/2018 163.17 65.94 76.51 0 0 0 1/24/2018 167.26 67.43 79.23 0 0 0 1/25/2018 166.28 67.77 80.57 0 0 0 1/26/2018 166.58 68.37 80.87 0 0 1 1/27/2018 166.77 68.87 81.07 0 0 0 1/28/2018 167.98 68.57 81.07 0 0 0 2/1/2018 167.98 68.77 81.59 0 1 0 2/2/2018 167.98 69.07 81.87 0 0 0 创建的列表字典,带有对整数的强制转换布尔掩码:

test.py

答案 1 :(得分:1)

你可以尝试这个,因为日期是你的索引:

In [18]: df1['IBMEARN'] = np.where(df1.index.isin(df2.IBMEARN),1,0)

In [19]: df1['BAEARN'] = np.where(df1.index.isin(df2.BAEARN),1,0)

In [21]: df1['CATEARN'] = np.where(df1.index.isin(df2.CATEARN),1,0)
In [22]: df1
Out[22]: 
              IBM     BA    CAT  IBMEARN  BAEARN  CATEARN
DATE                                                     
1/22/2018  163.13  65.94  76.50        1       0        0
1/23/2018  163.17  65.94  76.51        0       0        0
1/24/2018  167.26  67.43  79.23        0       0        0
1/25/2018  166.28  67.77  80.57        0       0        0
1/26/2018  166.58  68.37  80.87        0       0        1
1/27/2018  166.77  68.87  81.07        0       0        0
1/28/2018  167.98  68.57  81.07        0       0        0
2/1/2018   167.98  68.77  81.59        0       1        0
2/2/2018   167.98  69.07  81.87        0       0        0