我有一个多级列索引组对象,我试图从SQlite数据库发送和检索。 Pandas默认情况下将索引转换为一个看起来像元组的字符串(这很棒),但我遇到的问题是当读回表时,多级索引会丢失并且我留下字符串元组作为列标题。
以下是一个例子:
import pandas as pd
import numpy as np
import sqlite3
# Create a dataframe
data = {'Pets and Fruits' : ["Apples", "Oranges", "Puppies", "Ducks"]*5,
'C1' : [1., 2., 3., 4.]*5,
'C2' : [1., 2., 3., 4.]*5,}
df = pd.DataFrame(data)
# Groupby dataframe
df = df.groupby("Pets and Fruits").agg(['sum', 'mean'])
# Create a sqlite database
db = sqlite3.connect("Fruits and Pets.sqlite")
# Send the group to the database
df.to_sql(name="fruits_and_pets", con=db, if_exists='replace')
# Read back the table
df_read = pd.read_sql_query('''SELECT * FROM fruits_and_pets''',con=db,index_col="Pets and Fruits")
print df
print df_read
在进入数据库之前:
C1 C2
sum mean sum mean
Pets and Fruits
Apples 5.0 1.0 5.0 1.0
Ducks 20.0 4.0 20.0 4.0
Oranges 10.0 2.0 10.0 2.0
Puppies 15.0 3.0 15.0 3.0
回到表格数据库:
('C1', 'sum') ('C1', 'mean') ('C2', 'sum') ('C2', 'mean')
Pets and Fruits
Apples 5.0 1.0 5.0 1.0
Ducks 20.0 4.0 20.0 4.0
Oranges 10.0 2.0 10.0 2.0
Puppies 15.0 3.0 15.0 3.0
我可以使用df_read.columns = pd.MultiIndex.from_tuples([eval(x) for x in df_read.columns])
将数据帧转换回多级索引,但我想知道是否有更好的方法或内置的我缺少?
答案 0 :(得分:0)
import pandas as pd
import numpy as np
import sqlite3
# Create a dataframe
data = {'Pets and Fruits' : ["Apples", "Oranges", "Puppies", "Ducks"]*5,
'C1' : [1., 2., 3., 4.]*5,
'C2' : [1., 2., 3., 4.]*5,}
df = pd.DataFrame(data)
# Groupby dataframe
df = df.groupby("Pets and Fruits").agg(['sum', 'mean'])
df['Pets and Fruits'] = df.index.values
df = df.melt(id_vars='Pets and Fruits',var_name=['C','agg_type'])
# Create a sqlite database
db = sqlite3.connect("Fruits and Pets.sqlite")
# Send the group to the database
df.to_sql(name="fruits_and_pets", con=db, if_exists='replace',index=False)
# Read back the table
df_read = pd.read_sql_query('''SELECT * FROM fruits_and_pets''',con=db,index_col=["C","agg_type"])#,index_col=["Pets and Fruits",])
df_read = df_read.pivot(columns='Pets and Fruits').T
df_read.index = df_read.index.droplevel(0)
print(df)
print(df_read)