Python Pandas - 在列上合并两个数据框和子串

时间:2017-11-06 03:02:54

标签: python pandas

我在Python中有两个数据框,如下所示

df1 
CUSTOMER_KEY    LAST_NAME  FIRST_NAME   
30          f2b6769129  97bb97bebc  
46          ca0464878d  e276539bc2  
51          62f2905a7a  8dfabd6d61  
57          21032ca3bc  1f7e5e0c6e  
62          f7e7fdd8ce  eb6cf4af99  
64          f536998bbb  7fc39eacd1  
80          6069198f63  d873a71620  
99          0ba61a6f66  a6cf7af3eb
102         e8b579b776  c8048fd459

df2
CUSTOMER_KEY    LAST_NAME   FIRST_NAME
30          Arthur      Anderson      
46          Teresa      Johns     
51          Louise      Hurwitz     
57          Timothy         Addy     
62          Jeffery     Wilson      
64          Andres      Tuller      
80          Daniel      Green      
99          Frank       Nader      
102         Faith       Young

我想在Customer_key上加入这两个数据帧(我可以在Merge中做),然后在数据帧的几列上连接,在结果数据框中形成一个新的字符串。从以下数据框中我看到的结果如下

result_df
CUSTOMER_KEY LAST_NAME  FIRST_NAME
30           Artf2b676  And97bb97
46           Terca0464  Johe27653

基本上,df1中的substring(last_name,1,4)和df1中的substring(last_name,1,6)并将它们连接到新列中。与其他列类似。

我怎样才能实现这一点。

谢谢和问候

巴拉

2 个答案:

答案 0 :(得分:3)

使用DropDownList

str

如果您需要合并。

df2['LAST_NAME']=df2['LAST_NAME'].str[:3]+df1['LAST_NAME'].str[:6]
df2['FIRST_NAME']=df2['FIRST_NAME'].str[:3]+df1['FIRST_NAME'].str[:6]

df2
Out[768]: 
   CUSTOMER_KEY  LAST_NAME FIRST_NAME
0            30  Artf2b676  And97bb97
1            46  Terca0464  Johe27653
2            51  Lou62f290  Hur8dfabd
3            57  Tim21032c  Add1f7e5e
4            62  Jeff7e7fd  Wileb6cf4
5            64  Andf53699  Tul7fc39e
6            80  Dan606919  Gred873a7
7            99  Fra0ba61a  Nada6cf7a
8           102  Faie8b579  Youc8048f

答案 1 :(得分:1)

使用merge + str

import pandas as pd
df = pd.DataFrame([
    ['30','f2b6769129','97bb97bebc'],
    ['46','ca0464878d','e276539bc2'],
    ['51','62f2905a7a','8dfabd6d61'],
    ['57','21032ca3bc','1f7e5e0c6e'],
    ['62','f7e7fdd8ce','eb6cf4af99'],
    ['64','f536998bbb','7fc39eacd1'],
    ['80','6069198f63','d873a71620'],
    ['99','0ba61a6f66','a6cf7af3eb'],
    ['102','e8b579b776','c8048fd459']]
)

df2 = pd.DataFrame([
    ['30','Arthur','Anderson'],
    ['46','Teresa','Johns'],
    ['51','Louise','Hurwitz'],
    ['57','Timothy','Addy'],
    ['62','Jeffery','Wilson'],
    ['64','Andres','Tuller'],
    ['80','Daniel','Green'],
    ['99','Frank','Nader'],
    ['102','Faith','Young']]
)

keys = ['CUSTOMER_KEY','LAST_NAME','FIRST_NAME']
df.columns = keys
df2.columns = keys
df_join = pd.merge(df, df2, on="CUSTOMER_KEY", suffixes=['_1', '_2'])
df_join['LAST_NAME'] = df_join['LAST_NAME_2'].str.slice(0,3)+df_join['LAST_NAME_1'].str.slice(0,5)
df_join['FIRST_NAME'] = df_join['FIRST_NAME_2'].str.slice(0,3)+df_join['FIRST_NAME_1'].str.slice(0,5)
result_df = df_join[keys]


result_df.head()