我有一个熊猫数据框,如下所示:
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})
df
col1 col2
0 AA_L8_ZZ AAA_L8_1D
1 AA_L8_YY AA_L8_2D
2 AA_L80_XX AA_L80_5C
3 AA_L8_CC AA_L8_6Y
我想创建一列为col3
col3 =(“ col1”的前2个实例,用_分隔)+ _ +(“ col2”的第3个实例,用_分隔)
我的预期输出:
col1 col2 col3
0 AA_L8_ZZ AAA_L8_1D AA_L8_1D
1 AA_L8_YY AA_L8_2D AA_L8_2D
2 AA_L80_XX AA_L80_5C AA_L80_5C
3 AA_L8_CC AA_L8_6Y AA_L8_6Y
答案 0 :(得分:2)
让我们尝试一些正则表达式:
java.lang.IllegalStateException: If you are running in an IDE with enhancement plugin try a Build -> Rebuild Project to recompile and enhance all entity beans. Error - property createdAt not found in [name, address, id] for type class PersonEntity
输出:
df['col3'] = df['col1'].str.extract('^(.*_.*_)').add(df['col2'].str.extract('^.*_.*_([^_]*)'))[0]
答案 1 :(得分:2)
您可以使用以下str访问器方法:
df['col3'] = (df['col1'].str.rsplit('_', n=1).str[0]
.str.cat(df['col2'].str.rsplit('_', n=1).str[-1],
sep='_'))
df
输出:
col1 col2 col3
0 AA_L8_ZZ AAA_L8_1D AA_L8_1D
1 AA_L8_YY AA_L8_2D AA_L8_2D
2 AA_L80_XX AA_L80_5C AA_L80_5C
3 AA_L8_CC AA_L8_6Y AA_L8_6Y
rsplit
从结尾(右)开始分割的位置,而n
参数则限制分割的次数。 .str[n]
是从拆分生成的列表的索引,cat
是将字符串与sep='_'
串联在一起。
答案 2 :(得分:1)
import pandas as pd
import numpy as np
df = pd.DataFrame({'col1':['AA_L8_ZZ', 'AA_L8_YY', 'AA_L80_XX', 'AA_L8_CC'], 'col2':['AAA_L8_1D', 'AA_L8_2D', 'AA_L80_5C', 'AA_L8_6Y']})
#defining a list to store the contents for col3
a = []
#extracting the values by first changing the elements of both columns into string and then joining the extracted values and inserting into the list
for i,j in zip(df.col1, df.col2):
a.append(str(i).split('_')[0]+"_"+str(i).split('_')[1]+"_"+str(j).split('_')[2])
#defining new column and assigning the value to it
df['col3'] = a
print(df)