Question

I am working on following data frames, though original data frames are quite large with thousands of lines, for illustration purpose I am using much basic df.

My first df is the following :

        ID      value
    0   3       7387
    1   8       4784
    2   11      675
    3   21      900

And there is another huge df, say df2

        x            y          final_id
    0   -7.35        2.09       3
    1   -6.00        2.76       3
    2   -5.89        1.90       4
    3   -4.56        2.67       5
    4   -3.46        1.34       8
    5   -4.67        1.23       8
    6   -1.99        3.44       8
    7   -5.67        2.40       11
    8   -7.56        1.66       11
    9   -9.00        3.12       21
    10  -8.01        3.11       21 
    11  -7.90        3.19       22

Now, from the first df, I want to consider only "ID" column and match it's values to the "final_id" column in the second data frame(df2).

I want to create another df which contains only the filtered rows of df2, ie only the rows which contains "final_id" as 3, 8, 11, 21 (as per the "ID" column of df1).

Below would the resultant df:

         x            y         final_id
    0   -7.35        2.09       3
    1   -6.00        2.76       3
    2   -3.46        1.34       8
    3   -4.67        1.23       8
    4   -1.99        3.44       8
    5   -5.67        2.40       11
    6   -7.56        1.66       11
    7   -9.00        3.12       21
    8   -8.01        3.11       21

We can see rows 2, 3, 11 from df2 has been removed from resultant df.

Please help.

Answer 1

You can use isin to create a mask and then use the boolean mask to subset your df2:

mask = df2["final_id"].isin(df["ID"])
print(df2[mask])

        x      y    final_id
0   -7.35   2.09    3
1   -6.00   2.76    3
4   -3.46   1.34    8
5   -4.67   1.23    8
6   -1.99   3.44    8
7   -5.67   2.40    11
8   -7.56   1.66    11
9   -9.00   3.12    21
10  -8.01   3.11    21

How to create new df based on columns of two different data frames?

1 个答案: