我想从列表中创建一个数据框,问题是我的列名也在列表中。
列表:
['Input_file_column_name,Is_key,Config_file_column_name,Value\nEmployee ID,Y,identifierValue,identityTypeCode:001\nCumb ID,N,identifierValue,identityTypeCode:002\nFirst Name,N,first_Name \nLast Name,N,last_Name \nEmail,N,email_Address \nEntityID,N,entity_Id,entity_Id:01\nSourceCode,N,sourceCode,sourceCode:AHRWB\n']
结果数据框:
Input_file_column_name Is_key Config_file_column_name Value
0 Employee ID Y identifierValue identityTypeCode:001
1 Cumb ID N identifierValue identityTypeCode:002
5 EntityID N entity_Id entity_Id:01
6 SourceCode N sourceCode sourceCode:AHRWB
如何转换?我将列表转换成字典然后再做,还是有办法直接完成?
代码:
import pandas as pd
with open('onboard_config.txt') as myFile:
text = myFile.read()
result = text.split("regex")
print result
df=pd.DataFrame[[sub.split(",") for sub in result]]
答案 0 :(得分:2)
好像您需要splitlines
,然后转换为Series.str.split
df=pd.Series(l[0].splitlines()).str.split(',',expand=True).T.set_index(0).T.dropna()
df
Out[1183]:
0 Input_file_column_name ... Value
1 Employee ID ... identityTypeCode:001
2 Cumb ID ... identityTypeCode:002
6 EntityID ... entity_Id:01
7 SourceCode ... sourceCode:AHRWB
[4 rows x 4 columns]
答案 1 :(得分:0)
split=list[0].split('\n')
df= []
for i in split:
df.append(i.split(','))
columns= df[0]
df=df[1:]
pd.DataFrame(df, columns=columns)
这将为您提供所需的df。