Question

我有两个数据帧，

df1,
    Name    Stage   Description                                 key
0   Sri      1      Sri is one of the good singer in this two   one
1   NaN      2      Thanks for reading                          two has
2   Ram      1      Ram is two of the good cricket player       three
3   ganesh   1      one driver                                  four
4   NaN      2      good buddies                                NaN


 df2,
    values
    member of four
    one of three friends
    sri is a cricketer
    Rahul has two brothers

如果密钥存在于df2.values中，我想用df2值替换df1 [“key”]。

I tried, df1["key"]=df2[df2["values"].str.contains("|".join(df2["values"].tolist()),na=False)]

但我的输出顺序是相同的，

我想，

    output_df,
        Name    Stage   Description                                 key
0   Sri      1      Sri is one of the good singer in this two   one of three friends
1   NaN      2      Thanks for reading                          Rahul has two brothers
2   Ram      1      Ram is two of the good cricket player       one of three friends
3   ganesh   1      one driver                                  member of four
4   NaN      2      good buddies                                NaN

Answer 1

我将使用集合数组并使用<=进行子集测试和numpy广播。

setify = lambda x: set(x.split())
v = df2['values'].values.astype(str)
k = df1['key'].values.astype(str)
i = df1.index

# These the sets
a = np.array([setify(x) for x in k.tolist()])
b = np.array([setify(x) for x in v.tolist()])

# This is the broadcasting
matches = (a[:, None] <= b)

# Additional testing that there exist any matches
any_ = matches.any(1)
# Test that wasn't null in the first place
nul_ = df1['key'].notnull().values
mask = any_ & nul_

# And argmax to find where the first set match is.  There
# may be more than one match.  I chose to use `assign`
# therefore I used `mask` to pass a slice of a series
# to target the correct rows.
df1.assign(key1=pd.Series(v[matches.argmax(1)], i)[mask])

     Name  Stage                                Description      key                    key1
0     Sri      1  Sri is one of the good singer in this two      one    one of three friends
1     NaN      2                         Thanks for reading  two has  Rahul has two brothers
2     Ram      1      Ram is two of the good cricket player    three    one of three friends
3  ganesh      1                                 one driver     four          member of four
4     NaN      2                               good buddies      NaN                     NaN

匹配来自两个不同数据帧的密钥

1 个答案: