将数据集对象转换为Pandas DataFrame的最简单方法是什么?

时间:2018-04-23 20:38:04

标签: python dataframe orm sqlalchemy dataset

Python Dataset module基于Sqlalchemy并公开一个函数来返回名为all()的表中的所有记录。 all()返回一个可迭代的Dataset对象。

users = db['user'].all()

for user in db['user']:
   print(user['age'])

将数据集对象转换为Pandas DataFrame对象的最简单方法是什么?

为清楚起见,我有兴趣使用数据集的功能,因为它已经将表加载到数据集对象中。

4 个答案:

答案 0 :(得分:2)

这对我有用:

import dataset
import pandas
db = dataset.connect('sqlite:///db.sqlite3')
data = list(db['my_table'].all())
dataframe = pandas.DataFrame(data=data)

答案 1 :(得分:0)

import pandas as pd
df = pd.DataFrame(data=db['user'])
df

类似地

pd.DataFrame(db['user'])

应该做同样的事情

您还可以指定列或索引:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html

答案 2 :(得分:0)

Dataset module投入了大量时间之后,我发现all()可以迭代到一个列表中,然后变成一个pandas数据帧。有没有更好的方法呢?

import dataset
import pandas as pd

# create dataframe
df = pd.DataFrame()
names = ['Bob', 'Jane', 'Alice', 'Ricky']
ages = [31, 30, 31, 30]
df['names'] = names
df['ages'] = ages

print(df)

# create a dict oriented as records from dataframe
user = df.to_dict(orient='records')

# using dataset module instantiate database
db = dataset.connect('sqlite:///mydatabase.db')

# create a reference to a table
table = db['user']

# insert the complete dict into database
table.insert_many(user)

# use Dataset .all() to retrieve all table's rows
from_sql = table.all()  # custom ResultIter type (iterable)

# iterate ResultIter type into a list
data = []
for row in from_sql:
    data.append(row)

# create dataframe from list and ordereddict keys
df_new = pd.DataFrame(data, columns=from_sql.keys)

# this does not drop the id column, but it should??
df_new.drop(columns=['id'])

print(df_new)
'''
   names  ages
0    Bob    31
1   Jane    30
2  Alice    31
3  Ricky    30

      id  names  ages
0      1    Bob    31
1      2   Jane    30
2      3  Alice    31
3      4  Ricky    30

'''

答案 3 :(得分:0)

我已经创建了一些辅助函数,可以使这个过程更简单:

import dataset
import pandas as pd

def df_dataset_save(df, table_name, db_name='db'):
    try:
        df = df.to_dict(orient='records')
        db = dataset.connect('sqlite:///' + db_name + '.sqlite')
        table = db[table_name]
        table.insert_many(df)
        return 'success'
    except Exception as e:
        print(e)
        return None


def df_dataset_query_all(table_name, db_name='db', ids=False):
    try:
        db = dataset.connect('sqlite:///' + db_name + '.sqlite')
        table = db[table_name]
        from_sql = table.all()
        data = []
        for row in from_sql:
            data.append(row)
        df = pd.DataFrame(data, columns=from_sql.keys)
        if not ids:
            df.drop('id', axis=1, inplace=True)
        return df
    except Exception as e:
        print(e)
        return None


# create dataframe
users = pd.DataFrame()
names = ['Bob', 'Jane', 'Alice', 'Ricky']
ages = [31, 30, 31, 30]
users['names'] = names
users['ages'] = ages

# save dataframe
df_dataset_save(users, 'users')

# query saved dataframe
new_user = df_dataset_query_all('users')

print(new_user)

'''
    names  ages
0     Bob    31
1    Jane    30
2   Alice    31
3   Ricky    30
'''