Python Dataset module基于Sqlalchemy并公开一个函数来返回名为all()的表中的所有记录。 all()返回一个可迭代的Dataset对象。
users = db['user'].all()
for user in db['user']:
print(user['age'])
将数据集对象转换为Pandas DataFrame对象的最简单方法是什么?
为清楚起见,我有兴趣使用数据集的功能,因为它已经将表加载到数据集对象中。
答案 0 :(得分:2)
这对我有用:
import dataset
import pandas
db = dataset.connect('sqlite:///db.sqlite3')
data = list(db['my_table'].all())
dataframe = pandas.DataFrame(data=data)
答案 1 :(得分:0)
import pandas as pd
df = pd.DataFrame(data=db['user'])
df
类似地
pd.DataFrame(db['user'])
应该做同样的事情
您还可以指定列或索引:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html
答案 2 :(得分:0)
在Dataset module投入了大量时间之后,我发现all()可以迭代到一个列表中,然后变成一个pandas数据帧。有没有更好的方法呢?
import dataset
import pandas as pd
# create dataframe
df = pd.DataFrame()
names = ['Bob', 'Jane', 'Alice', 'Ricky']
ages = [31, 30, 31, 30]
df['names'] = names
df['ages'] = ages
print(df)
# create a dict oriented as records from dataframe
user = df.to_dict(orient='records')
# using dataset module instantiate database
db = dataset.connect('sqlite:///mydatabase.db')
# create a reference to a table
table = db['user']
# insert the complete dict into database
table.insert_many(user)
# use Dataset .all() to retrieve all table's rows
from_sql = table.all() # custom ResultIter type (iterable)
# iterate ResultIter type into a list
data = []
for row in from_sql:
data.append(row)
# create dataframe from list and ordereddict keys
df_new = pd.DataFrame(data, columns=from_sql.keys)
# this does not drop the id column, but it should??
df_new.drop(columns=['id'])
print(df_new)
'''
names ages
0 Bob 31
1 Jane 30
2 Alice 31
3 Ricky 30
id names ages
0 1 Bob 31
1 2 Jane 30
2 3 Alice 31
3 4 Ricky 30
'''
答案 3 :(得分:0)
我已经创建了一些辅助函数,可以使这个过程更简单:
import dataset
import pandas as pd
def df_dataset_save(df, table_name, db_name='db'):
try:
df = df.to_dict(orient='records')
db = dataset.connect('sqlite:///' + db_name + '.sqlite')
table = db[table_name]
table.insert_many(df)
return 'success'
except Exception as e:
print(e)
return None
def df_dataset_query_all(table_name, db_name='db', ids=False):
try:
db = dataset.connect('sqlite:///' + db_name + '.sqlite')
table = db[table_name]
from_sql = table.all()
data = []
for row in from_sql:
data.append(row)
df = pd.DataFrame(data, columns=from_sql.keys)
if not ids:
df.drop('id', axis=1, inplace=True)
return df
except Exception as e:
print(e)
return None
# create dataframe
users = pd.DataFrame()
names = ['Bob', 'Jane', 'Alice', 'Ricky']
ages = [31, 30, 31, 30]
users['names'] = names
users['ages'] = ages
# save dataframe
df_dataset_save(users, 'users')
# query saved dataframe
new_user = df_dataset_query_all('users')
print(new_user)
'''
names ages
0 Bob 31
1 Jane 30
2 Alice 31
3 Ricky 30
'''