跟进-根据熊猫中另一列的值创建新列

时间:2019-12-27 04:21:23

标签: python pandas

跟着我上一个问题-Creating new columns based on value from another column in pandas

我现在的目标是:

Code    Name        Level1    Level1Name    Level2  Level2Name  Level3  Level3Name
0   A   USA             A       USA             
1   AM  Massachusetts   A       USA          AM     Massachusetts   
2   AMB Boston          A       USA          AM     Massachusetts   AMB     Boston
3   AMS Springfield     A       USA          AM     Massachusetts   AMS     Springfiled
4   D   Germany         D   Germany          
5   DB  Brandenburg     D   Germany          DB     Brandenburg     
6   DBB     Berlin      D   Germany          DB     Brandenburg     DBB     Berlin
7   DBD     Dresden     D   Germany          DB     Brandenburg     DBD     Dresden

到目前为止,我基于Scott Boston的代码:

match   0   1   2
0       A   A   A
1       A   AM  AM
2       A   AM  AMB
3       A   AM  AMS
4       D   D   D
5       D   DB  DB
6       D   DB  DBB
7       D   DB  DBD

我的方法是循环遍历每一列,并删除与该列中其余值长度不同但似乎无法弄清楚逻辑的行。

示例代码

df = pd.read_excel(r'/Users/BoBoMann/Desktop/Sequence.xlsx')

df['Codes'] = [[*i] for i in df['Code']]
df_level = df['Code'].str.extractall('(.)')[0].unstack('match').fillna('').cumsum(axis=1)
df_level

谢谢您的帮助!

3 个答案:

答案 0 :(得分:1)

让我们尝试一下:

df['Codes'] = [[*i] for i in df['Code']]
df_level = df['Code'].str.extractall('(.)')[0].unstack('match', fill_value='')
df_level = df_level.cumsum(axis=1).mask(df_level == '')
s_map = df.explode('Codes').drop_duplicates('Code', keep='last').set_index('Code')['Name']
df_level.columns = [f'Level{i+1}' for i in df_level.columns]
df_level_names =  pd.concat([df_level[i].map(s_map) for i in df_level.columns], 
                            axis=1, 
                            keys=df_level.columns+'Name')
df_out = df.join([df_level, df_level_names]).drop('Codes', axis=1)
df_out

输出:

  Code           Name Level1 Level2 Level3 Level1Name     Level2Name   Level3Name
0    A            USA      A    NaN    NaN        USA            NaN          NaN
1   AM  Massachusetts      A     AM    NaN        USA  Massachusetts          NaN
2  AMB         Boston      A     AM    AMB        USA  Massachusetts       Boston
3  AMS    Springfield      A     AM    AMS        USA  Massachusetts  Springfield
4    D        Germany      D    NaN    NaN    Germany            NaN          NaN
5   DB    Brandenburg      D     DB    NaN    Germany    Brandenburg          NaN
6  DBB         Berlin      D     DB    DBB    Germany    Brandenburg       Berlin
7  DBD        Dresden      D     DB    DBD    Germany    Brandenburg      Dresden

答案 1 :(得分:0)

此方法使用apply和功能:

import pandas as pd
l = ['A', 'AM', 'AMB', 'AMS', 'D', 'DB', 'DBB', 'DBD']
df = pd.DataFrame(l).rename(columns={0:'code'})

def level2(col):
  if len(col) == 1:
    return ''
  elif len(col) >= 2:
    return col[:2]

def level3(col):
  if len(col) <= 2:
    return ''
  elif len(col) > 2:
    return col[:3]

df['Level1'] = df['code'].apply(lambda col: col[0])
df['Level2'] = df['code'].apply(level2)
df['Level3'] = df['code'].apply(level3)

print(df)

输出:

  code Level1 Level2 Level3
0    A      A              
1   AM      A     AM       
2  AMB      A     AM    AMB
3  AMS      A     AM    AMS
4    D      D              
5   DB      D     DB       
6  DBB      D     DB    DBB
7  DBD      D     DB    DBD

这些功能也可以重构为一个功能,但是您的要点是。我建议将apply用于其他熊猫方法,因为apply更容易记住和自定义。希望这会有所帮助。

答案 2 :(得分:0)

我采用了另一种方法:假设您没有太多的层次,请遍历代码的长度。

import pandas as pd
df=pd.DataFrame({
    'Code':['A','AM','AMB'],
    'Name':['USA','Massachusetts',"Boston"]
})
# prepare
res=pd.DataFrame({
    'Code':[]
})
df['len']=df['Code'].str.len()
cols=[]
for x in range(df['len'].max()):
    dfX=df[df['len']==x+1].copy()
    dfX['prefix']=dfX['Code'].str.slice(stop=x)

    dfX=dfX.merge(res,how='left',left_on='prefix',right_on='Code')

    dfX[f'Level{x+1}']=dfX['Code_x']
    dfX[f'Level{x+1}Name']=dfX['Name']
    dfX[f'Code']=dfX['Code_x']
    cols+=[f'Level{x+1}',f'Level{x+1}Name']
    res=res.append(dfX[['Code']+cols],sort=False)

res

Code    Level1  Level1Name  Level2  Level2Name  Level3  Level3Name
0   A   A   USA NaN NaN NaN NaN
0   AM  A   USA AM  Massachusetts   NaN NaN
0   AMB A   USA AM  Massachusetts   AMB Boston

首先将1级添加到查找表中;然后2和3级... 代码看起来很丑陋,但希望易于理解。