我有2个数据帧,一个数据帧具有单个数据,另一个数据帧是单个数据的配置规则。 这些是数据帧:
df1:
employee_Id first_Name last_Name email_Address
0 E1000 Manas Jani jam@xyz.com
1 E2000 Jim Kong jik@xyz.com
2 E3000 Olila Jayavarman olj@xyz.com
3 E4000 Lisa Kopkingg lik@xyz.com
4 E5000 Kishore Pindhar kip@xyz.com
5 E6000 Gobi Nadar gon@xyz.com
df2:
Input_file_name Is_key Config_file_name Value
0 Employee ID Y employee_Id idTypeCode:001
4 EntityID N entity_Id entity_Id:01
我需要得到的单个数据框看起来像这样,
Result_df:
employee_Id first_Name last_Name email_Address idTypeCode entity_Id
0 E1000 Manas Jani jam@xyz.com 001 01
1 E2000 Jim Kong jik@xyz.com 001 01
2 E3000 Olila Jayavarman olj@xyz.com 001 01
3 E4000 Lisa Kopkingg lik@xyz.com 001 01
4 E5000 Kishore Pindhar kip@xyz.com 001 01
5 E6000 Gobi Nadar gon@xyz.com 001 01
我无法理解如何将Value
列添加到最终数据框中。
答案 0 :(得分:1)
您要做什么并不清楚。但是,我希望这会对您有所帮助。
首先处理第一个数据集以提取值。
import pandas as pd
import io
# test data
zz = """Input_file_name Is_key Config_file_name Value
0 Employee ID Y employee_Id idTypeCode:001
4 Entity ID N entity_Id entity_Id:01
"""
df = pd.read_table(io.StringIO(zz), delim_whitespace=True)
extract = df['Value'].str.split(':',expand=True).transpose()
extract.columns = extract.iloc[0]
extract = extract.drop(extract.index[0]).reset_index(drop=True)
print(extract)
# 0 idTypeCode entity_Id
# 0 001 01
然后将两者合并。
# test data
zz = """employee_Id first_Name last_Name email_Address
0 E1000 Manas Jani jam@xyz.com
1 E2000 Jim Kong jik@xyz.com
2 E3000 Olila Jayavarman olj@xyz.com
3 E4000 Lisa Kopkingg lik@xyz.com
4 E5000 Kishore Pindhar kip@xyz.com
5 E6000 Gobi Nadar gon@xyz.com
"""
empl = pd.read_table(io.StringIO(zz), delim_whitespace=True)
pd.concat([empl, extract], axis=1, join='outer', ignore_index=True).fillna(method='ffill')
# employee_Id first_Name last_Name email_Address idTypeCode entity_Id
# 0 E1000 Manas Jani jam@xyz.com 001 01
# 1 E2000 Jim Kong jik@xyz.com 001 01
# 2 E3000 Olila Jayavarman olj@xyz.com 001 01
# 3 E4000 Lisa Kopkingg lik@xyz.com 001 01
# 4 E5000 Kishore Pindhar kip@xyz.com 001 01
# 5 E6000 Gobi Nadar gon@xyz.com 001 01