Question

我有两个数据帧。

df1：

userID    ID    Sex   Date   Month    Year   Security
  John    45   Male     31      03    1975        Low
   Tom    22   Male     01      01    1990       High
  Mary    33 Female     23      05    1990     Medium
  Hary    56   Male     15      09    1970       High

df2：

userID    ID    Sex   Date   Month    Year
  Hari    45   Male     31      03    1975
  Luka    22   Male     01      01    1990
 Johan    33 Female     23      05    1990
 Irfan    56   Male     29      09    1971
  John    45   Male     31      03    1975
   Tom    22   Male     01      01    1990
  Mary    34 Female     34      05    1980
  Hary    56   Male     15      09    1970

我想比较df2和df1并仅保留df2中具有列（userID，ID，Date，Month，Year）中的常用值

所以我的新df2应该如下所示：

  John    45   Male     31      03    1975
   Tom    22   Male     01      01    1990
  Hary    56   Male     15      09    1970

在大熊猫中最好的方法是什么？有人可以帮我吗？

Answer 1

只需使用简单的merge，然后加上dropna

df2.merge(df1,how='left').dropna().drop('Security',1)
Out[318]: 
  userID  ID   Sex  Date  Month  Year
4   John  45  Male    31      3  1975
5    Tom  22  Male     1      1  1990
7   Hary  56  Male    15      9  1970

Answer 2

定义要合并的键列，然后在df2和df1的键列之间执行内部合并。 merge的默认值为内部，因此您无需显式指定它。仅将df1设置为这些关键列可确保您不会通过合并将其任何列移至df2。

key_cols = ['userID', 'ID', 'Date', 'Month', 'Year']
df2.merge(df1.loc[:, df1.columns.isin(key_cols)])

输出：

  userID  ID   Sex  Date  Month  Year
0   John  45  Male    31      3  1975
1    Tom  22  Male     1      1  1990
2   Hary  56  Male    15      9  1970

如何根据某些列值比较两个数据框并在熊猫中删除它们

2 个答案: