从具有多列的列表中创建数据框

时间:2018-12-07 18:07:13

标签: python pandas python-2.7

我想从列表中创建一个数据框,问题是我的列名也在列表中。

列表:

['Input_file_column_name,Is_key,Config_file_column_name,Value\nEmployee ID,Y,identifierValue,identityTypeCode:001\nCumb ID,N,identifierValue,identityTypeCode:002\nFirst Name,N,first_Name \nLast Name,N,last_Name   \nEmail,N,email_Address   \nEntityID,N,entity_Id,entity_Id:01\nSourceCode,N,sourceCode,sourceCode:AHRWB\n']

结果数据框:

Input_file_column_name Is_key Config_file_column_name                 Value
0            Employee ID      Y         identifierValue  identityTypeCode:001
1                Cumb ID      N         identifierValue  identityTypeCode:002
5               EntityID      N               entity_Id          entity_Id:01
6             SourceCode      N              sourceCode      sourceCode:AHRWB

如何转换?我将列表转换成字典然后再做,还是有办法直接完成?

代码:

import pandas as pd
with open('onboard_config.txt') as myFile:
  text = myFile.read()
result = text.split("regex")
print result 

df=pd.DataFrame[[sub.split(",") for sub in result]]

2 个答案:

答案 0 :(得分:2)

好像您需要splitlines,然后转换为Series.str.split

df=pd.Series(l[0].splitlines()).str.split(',',expand=True).T.set_index(0).T.dropna()
df
Out[1183]: 
0 Input_file_column_name          ...                          Value
1            Employee ID          ...           identityTypeCode:001
2                Cumb ID          ...           identityTypeCode:002
6               EntityID          ...                   entity_Id:01
7             SourceCode          ...               sourceCode:AHRWB
[4 rows x 4 columns]

答案 1 :(得分:0)

    split=list[0].split('\n')
    df= []
    for i in split:
        df.append(i.split(','))

    columns= df[0]
    df=df[1:]
    pd.DataFrame(df, columns=columns)

这将为您提供所需的df。