Pandas多级索引进出sql

时间:2017-01-24 19:28:14

标签: python sql sqlite pandas

我有一个多级列索引组对象,我试图从SQlite数据库发送和检索。 Pandas默认情况下将索引转换为一个看起来像元组的字符串(这很棒),但我遇到的问题是当读回表时,多级索引会丢失并且我留下字符串元组作为列标题。

以下是一个例子:

import pandas as pd
import numpy as np
import sqlite3

# Create a dataframe
data = {'Pets and Fruits'  : ["Apples", "Oranges", "Puppies", "Ducks"]*5,
        'C1'     : [1., 2., 3., 4.]*5,
        'C2'     : [1., 2., 3., 4.]*5,}
df = pd.DataFrame(data)

# Groupby dataframe
df = df.groupby("Pets and Fruits").agg(['sum', 'mean'])

# Create a sqlite database
db = sqlite3.connect("Fruits and Pets.sqlite")

# Send the group to the database
df.to_sql(name="fruits_and_pets", con=db, if_exists='replace')

# Read back the table
df_read = pd.read_sql_query('''SELECT * FROM fruits_and_pets''',con=db,index_col="Pets and Fruits")

print df
print df_read

在进入数据库之前:

                   C1         C2     
                  sum mean   sum mean
Pets and Fruits                      
Apples            5.0  1.0   5.0  1.0
Ducks            20.0  4.0  20.0  4.0
Oranges          10.0  2.0  10.0  2.0
Puppies          15.0  3.0  15.0  3.0

回到表格数据库:

                 ('C1', 'sum')  ('C1', 'mean')  ('C2', 'sum')  ('C2', 'mean')
Pets and Fruits                                                              
Apples                     5.0             1.0            5.0             1.0
Ducks                     20.0             4.0           20.0             4.0
Oranges                   10.0             2.0           10.0             2.0
Puppies                   15.0             3.0           15.0             3.0

我可以使用df_read.columns = pd.MultiIndex.from_tuples([eval(x) for x in df_read.columns])将数据帧转换回多级索引,但我想知道是否有更好的方法或内置的我缺少?

1 个答案:

答案 0 :(得分:0)

import pandas as pd
import numpy as np
import sqlite3

# Create a dataframe
data = {'Pets and Fruits'  : ["Apples", "Oranges", "Puppies", "Ducks"]*5,
        'C1'     : [1., 2., 3., 4.]*5,
        'C2'     : [1., 2., 3., 4.]*5,}
df = pd.DataFrame(data)

# Groupby dataframe
df = df.groupby("Pets and Fruits").agg(['sum', 'mean'])
df['Pets and Fruits'] = df.index.values
df = df.melt(id_vars='Pets and Fruits',var_name=['C','agg_type'])
# Create a sqlite database
db = sqlite3.connect("Fruits and Pets.sqlite")

# Send the group to the database
df.to_sql(name="fruits_and_pets", con=db, if_exists='replace',index=False)

# Read back the table
df_read = pd.read_sql_query('''SELECT * FROM fruits_and_pets''',con=db,index_col=["C","agg_type"])#,index_col=["Pets and Fruits",])
df_read = df_read.pivot(columns='Pets and Fruits').T
df_read.index = df_read.index.droplevel(0)
print(df)
print(df_read)