通过拆分值在pandas数据框中创建一列

时间:2020-08-07 02:49:57

标签: python-3.x pandas

我有一个熊猫数据框,如下所示:

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})
df

    col1        col2
0   AA_L8_ZZ    AAA_L8_1D
1   AA_L8_YY    AA_L8_2D
2   AA_L80_XX   AA_L80_5C
3   AA_L8_CC    AA_L8_6Y

我想创建一列为col3

col3 =(“ col1”的前2个实例,用_分隔)+ _ +(“ col2”的第3个实例,用_分隔)

我的预期输出:

    col1        col2        col3
0   AA_L8_ZZ    AAA_L8_1D   AA_L8_1D
1   AA_L8_YY    AA_L8_2D    AA_L8_2D
2   AA_L80_XX   AA_L80_5C   AA_L80_5C
3   AA_L8_CC    AA_L8_6Y    AA_L8_6Y

3 个答案:

答案 0 :(得分:2)

让我们尝试一些正则表达式:

java.lang.IllegalStateException: If you are running in an IDE with enhancement plugin try a Build -> Rebuild Project to recompile and enhance all entity beans. Error - property createdAt not found in [name, address, id] for type class PersonEntity

输出:

df['col3'] = df['col1'].str.extract('^(.*_.*_)').add(df['col2'].str.extract('^.*_.*_([^_]*)'))[0]

答案 1 :(得分:2)

您可以使用以下str访问器方法:

df['col3'] = (df['col1'].str.rsplit('_', n=1).str[0]
                        .str.cat(df['col2'].str.rsplit('_', n=1).str[-1], 
                                 sep='_'))
df

输出:

        col1       col2       col3
0   AA_L8_ZZ  AAA_L8_1D   AA_L8_1D
1   AA_L8_YY   AA_L8_2D   AA_L8_2D
2  AA_L80_XX  AA_L80_5C  AA_L80_5C
3   AA_L8_CC   AA_L8_6Y   AA_L8_6Y

rsplit从结尾(右)开始分割的位置,而n参数则限制分割的次数。 .str[n]是从拆分生成的列表的索引,cat是将字符串与sep='_'串联在一起。

答案 2 :(得分:1)

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})

#defining a list to store the contents for col3
a = []

#extracting the values by first changing the elements of both columns into string and then joining the extracted values and inserting into the list 
for i,j in zip(df.col1, df.col2):
    a.append(str(i).split('_')[0]+"_"+str(i).split('_')[1]+"_"+str(j).split('_')[2])

#defining new column and assigning the value to it
df['col3'] =  a

print(df)