Question

我有一个熊猫数据框，如下所示：

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})
df

    col1        col2
0   AA_L8_ZZ    AAA_L8_1D
1   AA_L8_YY    AA_L8_2D
2   AA_L80_XX   AA_L80_5C
3   AA_L8_CC    AA_L8_6Y

我想创建一列为col3

col3 =（“ col1”的前2个实例，用_分隔）+ _ +（“ col2”的第3个实例，用_分隔）

我的预期输出：

    col1        col2        col3
0   AA_L8_ZZ    AAA_L8_1D   AA_L8_1D
1   AA_L8_YY    AA_L8_2D    AA_L8_2D
2   AA_L80_XX   AA_L80_5C   AA_L80_5C
3   AA_L8_CC    AA_L8_6Y    AA_L8_6Y

Answer 1

让我们尝试一些正则表达式：

java.lang.IllegalStateException: If you are running in an IDE with enhancement plugin try a Build -> Rebuild Project to recompile and enhance all entity beans. Error - property createdAt not found in [name, address, id] for type class PersonEntity

输出：

df['col3'] = df['col1'].str.extract('^(.*_.*_)').add(df['col2'].str.extract('^.*_.*_([^_]*)'))[0]

Answer 2

您可以使用以下str访问器方法：

df['col3'] = (df['col1'].str.rsplit('_', n=1).str[0]
                        .str.cat(df['col2'].str.rsplit('_', n=1).str[-1], 
                                 sep='_'))
df

输出：

        col1       col2       col3
0   AA_L8_ZZ  AAA_L8_1D   AA_L8_1D
1   AA_L8_YY   AA_L8_2D   AA_L8_2D
2  AA_L80_XX  AA_L80_5C  AA_L80_5C
3   AA_L8_CC   AA_L8_6Y   AA_L8_6Y

rsplit从结尾（右）开始分割的位置，而n参数则限制分割的次数。 .str[n]是从拆分生成的列表的索引，cat是将字符串与sep='_'串联在一起。

Answer 3

import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})

#defining a list to store the contents for col3
a = []

#extracting the values by first changing the elements of both columns into string and then joining the extracted values and inserting into the list 
for i,j in zip(df.col1, df.col2):
    a.append(str(i).split('_')[0]+"_"+str(i).split('_')[1]+"_"+str(j).split('_')[2])

#defining new column and assigning the value to it
df['col3'] =  a

print(df)

通过拆分值在pandas数据框中创建一列

3 个答案: