如何从python列表中分配Pandas中的列名

时间:2019-04-19 00:23:23

标签: python pandas list dataframe

我有想要转换为pandas Dataframe的列表的python列表。我想以以下格式创建数据框:

table_id           created     Mb (etc.)
1 NetworkClicks      2018-10-26  0.22
2 NetworkImpressions 2018-10-26  1519.24

(总共6行,基于下面的列表示例)

列名在每个列表中,例如MB,已创建,已修改,table_id。

列出示例:

ls_all = [
    [(u'Mb', u'928.11'), (u'created', datetime.date(2018, 10, 25)), (u'modified', datetime.date(2019, 4, 18)), (u'Rows_Mil', u'4,378'), (u'table_id', u'NetworkActiveViews'), (u'Tb', u'0.91')],
    [(u'Mb', u'800.67'), (u'created', datetime.date(2018, 10, 26)), (u'modified', datetime.date(2019, 4, 18)), (u'Rows_Mil', u'3,577'), (u'table_id', u'NetworkBackfillActiveViews'), (u'Tb', u'0.78')],
    [(u'Mb', u'2.44'), (u'created', datetime.date(2018, 10, 26)), (u'modified', datetime.date(2019, 4, 18)), (u'Rows_Mil', u'11'), (u'table_id', u'NetworkBackfillClicks'), (u'Tb', u'0.00')],
    [(u'Mb', u'1190.52'), (u'created', datetime.date(2018, 10, 26)), (u'modified', datetime.date(2019, 4, 18)), (u'Rows_Mil', u'5,269'), (u'table_id', u'NetworkBackfillImpressions'), (u'Tb', u'1.16')],
    [(u'Mb', u'0.22'), (u'created', datetime.date(2018, 10, 26)), (u'modified', datetime.date(2019, 4, 18)), (u'Rows_Mil', u'1'), (u'table_id', u'NetworkClicks'), (u'Tb', u'0.00')],
    [(u'Mb', u'1519.24'), (u'created', datetime.date(2018, 10, 26)), (u'modified', datetime.date(2019, 4, 18)), (u'Rows_Mil', u'7,089'), (u'table_id', u'NetworkImpressions'), (u'Tb', u'1.48')]
]

我尝试过 df = pd.DataFrame(ls_all, columns=ls_all[0])

但是它给了我这个数据帧:

    (Mb, 928.11)  ...  (Tb, 0.91)
0   (Mb, 928.11)  ...  (Tb, 0.91)
1   (Mb, 800.67)  ...  (Tb, 0.78)
2     (Mb, 2.44)  ...  (Tb, 0.00)
3  (Mb, 1190.52)  ...  (Tb, 1.16)
4     (Mb, 0.22)  ...  (Tb, 0.00)
5  (Mb, 1519.24)  ...  (Tb, 1.48)

2 个答案:

答案 0 :(得分:3)

使用字典列表而不是元组列表。

list_of_dicts = [dict(x) for x in ls_all]

df = pd.DataFrame(list_of_dicts)

        Mb Rows_Mil    Tb     created    modified                    table_id
0   928.11    4,378  0.91  2018-10-25  2019-04-18          NetworkActiveViews
1   800.67    3,577  0.78  2018-10-26  2019-04-18  NetworkBackfillActiveViews
2     2.44       11  0.00  2018-10-26  2019-04-18       NetworkBackfillClicks
3  1190.52    5,269  1.16  2018-10-26  2019-04-18  NetworkBackfillImpressions
4     0.22        1  0.00  2018-10-26  2019-04-18               NetworkClicks

答案 1 :(得分:0)

我喜欢上面的词典列表,这是另一种方式:

从列表中获取数据

lists = []

for list in ls_all:
    temp = [x[1] for x in list]
    lists.append(temp)

获取列名

columns = [x[0] for x in ls_all[0]]

加载到DataFrame

df = pd.DataFrame(lists, columns=columns)

结果

        Mb     created    modified Rows_Mil                    table_id    Tb
0   928.11  2018-10-25  2019-04-18    4,378          NetworkActiveViews  0.91
1   800.67  2018-10-26  2019-04-18    3,577  NetworkBackfillActiveViews  0.78
2     2.44  2018-10-26  2019-04-18       11       NetworkBackfillClicks  0.00
3  1190.52  2018-10-26  2019-04-18    5,269  NetworkBackfillImpressions  1.16
4     0.22  2018-10-26  2019-04-18        1               NetworkClicks  0.00
5  1519.24  2018-10-26  2019-04-18    7,089          NetworkImpressions  1.48