我在Python中有两个数据框,如下所示
df1
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 f2b6769129 97bb97bebc
46 ca0464878d e276539bc2
51 62f2905a7a 8dfabd6d61
57 21032ca3bc 1f7e5e0c6e
62 f7e7fdd8ce eb6cf4af99
64 f536998bbb 7fc39eacd1
80 6069198f63 d873a71620
99 0ba61a6f66 a6cf7af3eb
102 e8b579b776 c8048fd459
df2
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 Arthur Anderson
46 Teresa Johns
51 Louise Hurwitz
57 Timothy Addy
62 Jeffery Wilson
64 Andres Tuller
80 Daniel Green
99 Frank Nader
102 Faith Young
我想在Customer_key
上加入这两个数据帧(我可以在Merge中做),然后在数据帧的几列上连接,在结果数据框中形成一个新的字符串。从以下数据框中我看到的结果如下
result_df
CUSTOMER_KEY LAST_NAME FIRST_NAME
30 Artf2b676 And97bb97
46 Terca0464 Johe27653
基本上,df1中的substring(last_name,1,4)和df1中的substring(last_name,1,6)并将它们连接到新列中。与其他列类似。
我怎样才能实现这一点。
谢谢和问候
巴拉
答案 0 :(得分:3)
使用DropDownList
str
如果您需要合并。
df2['LAST_NAME']=df2['LAST_NAME'].str[:3]+df1['LAST_NAME'].str[:6]
df2['FIRST_NAME']=df2['FIRST_NAME'].str[:3]+df1['FIRST_NAME'].str[:6]
df2
Out[768]:
CUSTOMER_KEY LAST_NAME FIRST_NAME
0 30 Artf2b676 And97bb97
1 46 Terca0464 Johe27653
2 51 Lou62f290 Hur8dfabd
3 57 Tim21032c Add1f7e5e
4 62 Jeff7e7fd Wileb6cf4
5 64 Andf53699 Tul7fc39e
6 80 Dan606919 Gred873a7
7 99 Fra0ba61a Nada6cf7a
8 102 Faie8b579 Youc8048f
答案 1 :(得分:1)
使用merge + str
import pandas as pd
df = pd.DataFrame([
['30','f2b6769129','97bb97bebc'],
['46','ca0464878d','e276539bc2'],
['51','62f2905a7a','8dfabd6d61'],
['57','21032ca3bc','1f7e5e0c6e'],
['62','f7e7fdd8ce','eb6cf4af99'],
['64','f536998bbb','7fc39eacd1'],
['80','6069198f63','d873a71620'],
['99','0ba61a6f66','a6cf7af3eb'],
['102','e8b579b776','c8048fd459']]
)
df2 = pd.DataFrame([
['30','Arthur','Anderson'],
['46','Teresa','Johns'],
['51','Louise','Hurwitz'],
['57','Timothy','Addy'],
['62','Jeffery','Wilson'],
['64','Andres','Tuller'],
['80','Daniel','Green'],
['99','Frank','Nader'],
['102','Faith','Young']]
)
keys = ['CUSTOMER_KEY','LAST_NAME','FIRST_NAME']
df.columns = keys
df2.columns = keys
df_join = pd.merge(df, df2, on="CUSTOMER_KEY", suffixes=['_1', '_2'])
df_join['LAST_NAME'] = df_join['LAST_NAME_2'].str.slice(0,3)+df_join['LAST_NAME_1'].str.slice(0,5)
df_join['FIRST_NAME'] = df_join['FIRST_NAME_2'].str.slice(0,3)+df_join['FIRST_NAME_1'].str.slice(0,5)
result_df = df_join[keys]
result_df.head()