给定两个数据框如下:
df1:
id address price
0 1 8563 Parker Ave. Lexington, NC 27292 3
1 2 242 Bellevue Lane Appleton, WI 54911 3
2 3 771 Greenview Rd. Greenfield, IN 46140 5
3 4 93 Hawthorne Street Lakeland, FL 33801 6
4 5 8952 Green Hill Street Gettysburg, PA 17325 3
5 6 7331 S. Sherwood Dr. New Castle, PA 16101 4
df2:
state street quantity
0 PA S. Sherwood 12
1 IN Hawthorne Street 3
2 NC Parker Ave. 7
假设 state
的 street
和 df2
都包含在 address
的 df2
中,然后将 df2
合并到 {{1 }}。
我怎么能在 Pandas 中做到这一点?谢谢。
预期结果df1
:
df
我的测试代码:
id address ... street quantity
0 1 8563 Parker Ave. Lexington, NC 27292 ... Parker Ave. 7.00
1 2 242 Bellevue Lane Appleton, WI 54911 ... NaN NaN
2 3 771 Greenview Rd. Greenfield, IN 46140 ... NaN NaN
3 4 93 Hawthorne Street Lakeland, FL 33801 ... NaN NaN
4 5 8952 Green Hill Street Gettysburg, PA 17325 ... NaN NaN
5 6 7331 S. Sherwood Dr. New Castle, PA 16101 ... S. Sherwood 12.00
[6 rows x 6 columns]
输出:
df2['addr'] = df2['state'].astype(str) + df2['street'].astype(str)
pat = '|'.join(r'\b{}\b'.format(x) for x in df2['addr'])
df1['addr']= df1['address'].str.extract('\('+ pat + ')', expand=False)
df = df1.merge(df2, on='addr', how='left')
答案 0 :(得分:1)
k="|".join(df2['street'].to_list())
df1=df1.assign(temp=df1['address'].str.findall(k).str.join(', '), temp1=df1['address'].str.split(",").str[-1])
dfnew=pd.merge(df1,df2, how='left', left_on=['temp','temp1'], right_on=['street',"state"])
答案 1 :(得分:1)
尝试:
pat_state = f"({'|'.join(df2['state'])})"
pat_street = f"({'|'.join(df2['street'])})"
df1['street'] = df1['address'].str.extract(pat=pat_street)
df1['state'] = df1['address'].str.extract(pat=pat_state)
df1.loc[df1['state'].isna(),'street'] = np.NAN
df1.loc[df1['street'].isna(),'state'] = np.NAN
df1 = df1.merge(df2, left_on=['state','street'], right_on=['state','street'], how ='left')