我的表格如下:
username email name phone1 phone2
1920 abc@gmail.com TSteve/Nancy a b
我希望这是:
username email first_name last_name phone1 phone2
1920 abc@gmail.com Steve T a
1920-2 Nancy T b
此表格式为csv格式。
总结:
1)分裂"名称"列到" first_name"和" last_name"并取出并移动第一个字母" T"在这种情况下," last_name"并将该行分为两个名称" Steve"和"南希"。史蒂夫/南希将根据" /"进行分割。并删除斜杠。
2)电话1停留但电话2将移至下一行。 (我稍后会合并phone1和2)
3)第二行的新用户名与" -2"相同。最后。
我花了3天时间尝试了几件事,但都失败了。如果你们能指导我完成这些步骤,那将对我学习和学习非常有帮助。
谢谢
答案 0 :(得分:1)
我认为你需要:
#get all columns without name
cols = df.columns.difference(['name']).tolist()
#create MultiIndex and split, reshape by stack
df = (df.set_index(cols)['name']
.str.split('/',expand=True)
.stack()
.reset_index(name='first_name'))
#boolen mask for select only first or last new rows
m = df['level_4'].eq(0)
#remove column level_4
df = df.drop('level_4', axis=1)
#add last name for select first letter by condition, replace NaNs by forward fill
df['last_name'] = df['first_name'].str[0].where(m).ffill()
#get all letter without first
df['first_name'] = df['first_name'].mask(m, df['first_name'].str[1:])
#remove first all last values
df['email'] = df['email'].where(m, '')
df['phone1'] = df['phone1'].where(m, '')
df['phone2'] = df['phone2'].mask(m, '')
#add -2 for second rows
df['username'] = df['username'].where(m, df['username'].astype(str) + '-2')
print (df)
email phone1 phone2 username first_name last_name
0 abc@gmail.com a 1920 Steve T
1 b 1920-2 Nancy T
2 abcd@gmail.com a 1921 Steve K
3 b 1921-2 Nancy K