动态导入csv并将其映射到sqlalchemy

时间:2019-01-27 11:05:34

标签: pandas sqlite sqlalchemy flask-sqlalchemy

我正在使用flask应用程序中的sqlalchemy创建数据库,并使用现有CSV填充数据库并从中选择列,因此我在这里使用pandas创建类

我需要添加公司对象并以动态方式提交它们,但是那样行不通,csv文件很小,大约有20,000条记录,我无法手动添加它们,所以有任何建议以动态方式添加它们吗? / p>

from sqlalchemy.ext.declarative import declarative_base 
from sqlalchemy.orm import relationship
from sqlalchemy import create_engine
from flask import jsonify

Base = declarative_base()


class Company(Base):
    __tablename__ = 'forbesglobal2000_2016'

    id = Column(Integer, primary_key=True)
    name = Column(String(250), nullable=False)
    profits = Column(String(250), nullable=False)
    marketValue = Column(String(250), nullable=False)
    revenue = Column(String(250), nullable=False)
    industry = Column(String(250), nullable=False)





class SIC(Base):
    __tablename__ = "SIC"


    id = Column(Integer, primary_key=True)
    SIC = Column(Integer, nullable=False)
    Industry_name = Column(String(250),ForeignKey('forbesglobal2000_2016.industry'))
    Indusrty = relationship(Company)


# configuration part
engine = create_engine('sqlite:///CompainesData.db')

Base.metadata.create_all(engine)

import sqlalchemy
from sqlalchemy.orm import sessionmaker
from database_setup import *
import pandas as pd
# opening connection with database

engine = create_engine('sqlite:///CompainesData.db')
Base.metadata.bind = engine
# Clear database
Base.metadata.drop_all(engine)
Base.metadata.create_all(engine)
DBSession = sessionmaker(bind=engine)
session = DBSession()

df = pd.read_csv("forbesglobal2000-2016.csv")
df1 = pd.read_csv("SIC.csv")

# market valuation, revenue, profits and industry
profit_column = df.profits
name_column = df.name
industry_column = df.industry
revenue_column = df.revenue
marketvalue_column = df.marketValue
industry_column_f = df1.Description
SIC_column = df1.SICCode


company = []
i = 1
while i < name_column.__len__():
    company[i] = Company(name = name_column[i] ,     industry=industry_column[i], marketValue = marketvalue_column[i] , profits =     profit_column[i] ,
                     revenue = revenue_column[i] )

    i = i +1
for i in company:
    session.add(i)
    session.commit()


# printing test
com = session.query(Company).all()
for f in com:
    print(f.name)
    print(f.industry)
    print(f.profits)
    print(f.revenue)
    print(f.marketValue)

2 个答案:

答案 0 :(得分:0)

我认为索引将从0开始而不是1:

i = 1

应该是

i = 0 

您可以尝试吗?

答案 1 :(得分:0)

如果要将数据从csv文件加载到数据库,只需使用df.to_sql()函数即可。例如:

df.to_sql(con=engine, name=airlines.__tablename__, if_exists='replace',index=False)

请注意index = False,它用于忽略pandas id列。