似乎有很多DF问题来自另一个DF的条件,但我找不到任何我需要的东西。两个数据帧都是小样本。它们每列都有数千列。我有一个看起来像这样的DataFrame(df1)
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 NaN NaN NaN
1/23/2018 163.17 65.94 76.51 NaN NaN NaN
1/24/2018 167.26 67.43 79.23 NaN NaN NaN
1/25/2018 166.28 67.77 80.57 NaN NaN NaN
1/26/2018 166.58 68.37 80.87 NaN NaN NaN
1/27/2018 166.77 68.87 81.07 NaN NaN NaN
1/28/2018 167.98 68.57 81.07 NaN NaN NaN
2/1/2018 167.98 68.77 81.59 NaN NaN NaN
2/2/2018 167.98 69.07 81.87 NaN NaN NaN
我有另一个数据帧(df2),其列数与df1中的最后三列相同但具有特定日期
IBM EARN BA EARN CAT EARN
0 1/22/2018 2/1/2018 1/26/2018
1 10/19/2017 10/26/2017 10/25/2017
2 7/20/2017 7/27/2017 7/26/2017
3 4/20/2017 4/27/2017 4/26/2017
4 1/23/2017 1/26/2017 1/27/2017
5 10/19/2016 10/27/2016 10/26/2016
6 7/20/2016 7/28/2016 7/27/2016
我想在df1中放置1,其中df2中有相应的日期。所以(部分)结果看起来像这样,但会继续df2中的所有日期列表。
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 **1** NaN NaN
1/23/2018 163.17 65.94 76.51 NaN NaN NaN
1/24/2018 167.26 67.43 79.23 NaN NaN NaN
1/25/2018 166.28 67.77 80.57 NaN NaN NaN
1/26/2018 166.58 68.37 80.87 NaN NaN **1**
1/27/2018 166.77 68.87 81.07 NaN NaN NaN
1/28/2018 167.98 68.57 81.07 NaN NaN NaN
2/1/2018 167.98 68.77 81.59 NaN **1** NaN
2/2/2018 167.98 69.07 81.87 NaN NaN NaN
如果您能提供解决方案,请告诉我
答案 0 :(得分:2)
对于DaatFrame
的每个第二列for col in df2.columns:
df1[col] = np.where(df1.index.isin(df2[col]),1,np.nan)
print (df1)
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 1.0 NaN NaN
1/23/2018 163.17 65.94 76.51 NaN NaN NaN
1/24/2018 167.26 67.43 79.23 NaN NaN NaN
1/25/2018 166.28 67.77 80.57 NaN NaN NaN
1/26/2018 166.58 68.37 80.87 NaN NaN 1.0
1/27/2018 166.77 68.87 81.07 NaN NaN NaN
1/28/2018 167.98 68.57 81.07 NaN NaN NaN
2/1/2018 167.98 68.77 81.59 NaN 1.0 NaN
2/2/2018 167.98 69.07 81.87 NaN NaN NaN
检查成员资格,Index.isin
和numpy.where
替换值:
df2
编辑:
DataFrame.isin
的非循环解决方案,由#first create DataFrame by repeat index of df1
#https://stackoverflow.com/a/45118399
arr = np.broadcast_to(df1.index[:, None], (len(df1), len(df2.columns)))
df3 = pd.DataFrame(arr, columns=df2.columns, index=df1.index)
df3 = df3.isin(df2.to_dict('l')).astype(int)
print (df3)
IBM EARN BA EARN CAT EARN
Date
1/22/2018 1 0 0
1/23/2018 0 0 0
1/24/2018 0 0 0
1/25/2018 0 0 0
1/26/2018 0 0 1
1/27/2018 0 0 0
1/28/2018 0 0 0
2/1/2018 0 1 0
2/2/2018 0 0 0
df1 = df1.drop(df2.columns, 1).join(df3)
print (df1)
IBM BA CAT IBM EARN BA EARN CAT EARN
Date
1/22/2018 163.13 65.94 76.50 1 0 0
1/23/2018 163.17 65.94 76.51 0 0 0
1/24/2018 167.26 67.43 79.23 0 0 0
1/25/2018 166.28 67.77 80.57 0 0 0
1/26/2018 166.58 68.37 80.87 0 0 1
1/27/2018 166.77 68.87 81.07 0 0 0
1/28/2018 167.98 68.57 81.07 0 0 0
2/1/2018 167.98 68.77 81.59 0 1 0
2/2/2018 167.98 69.07 81.87 0 0 0
创建的列表字典,带有对整数的强制转换布尔掩码:
test.py
答案 1 :(得分:1)
你可以尝试这个,因为日期是你的索引:
In [18]: df1['IBMEARN'] = np.where(df1.index.isin(df2.IBMEARN),1,0)
In [19]: df1['BAEARN'] = np.where(df1.index.isin(df2.BAEARN),1,0)
In [21]: df1['CATEARN'] = np.where(df1.index.isin(df2.CATEARN),1,0)
In [22]: df1
Out[22]:
IBM BA CAT IBMEARN BAEARN CATEARN
DATE
1/22/2018 163.13 65.94 76.50 1 0 0
1/23/2018 163.17 65.94 76.51 0 0 0
1/24/2018 167.26 67.43 79.23 0 0 0
1/25/2018 166.28 67.77 80.57 0 0 0
1/26/2018 166.58 68.37 80.87 0 0 1
1/27/2018 166.77 68.87 81.07 0 0 0
1/28/2018 167.98 68.57 81.07 0 0 0
2/1/2018 167.98 68.77 81.59 0 1 0
2/2/2018 167.98 69.07 81.87 0 0 0