Pandas + SQLAlchemy:通过Pandas从csv到sqlite提取多个特定列

时间:2018-10-31 15:59:58

标签: python pandas sqlalchemy

我有一个.csv文件,其中包含> 100列和数千行:

>  Datetime          A        B       C      D       E     ...   FA      FB
> 01.01.2014 00:00  15,15   15,15   32,43   15,15   33,27       82,59   1,38
> 01.01.2014 01:00  12,96   12,96   32,49   12,96   30,07       82,59   1,38
> 01.01.2014 02:00  12,09   12,09   28,43   12,09   23,01       82,59   1,38
> 01.01.2014 03:00  11,7    11,7    27,63   11,7    11,04       82,59   1,38
> 01.01.2014 04:00  11,66   11,66   25,99   11,66   9,09        82,59   1,38
>       ...         ...     ...     ...     ...     ...         ...     ...
> 01.10.2018 23:00  9,85    9,85    17,2    9,85    10,44       92,15   1,09

现在,我需要按列提取此数据并将其导出到sqlite3数据库中,如下所示:

Datetime and A
Datetime and B
Datetime and C
...
Datetime and FB

为了获得如下所示的数据库表:

Datetime             Value   ID
> 01.01.2014 00:00  15,15    A   
> 01.01.2014 01:00  12,96    A   
> 01.01.2014 02:00  12,09    A
> ...               ...      ...
> 01.01.2014 00:00  15,15    FB   
> 01.01.2014 01:00  12,96    FB   
> 01.01.2014 02:00  12,09    FB

我设法使用以下代码写一些数据:

import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String, Numeric, DateTime
from sqlalchemy.orm import sessionmaker
from datetime import datetime
import pandas as pd

Base = declarative_base()

# Declaration of the class in order to write into the database. This structure is standard and should align with SQLAlchemy's doc.
class Values_1(Base):
    __tablename__ = 'Timeseries_Values'

    ID = Column(Integer, primary_key=True)
    Date = Column(DateTime, primary_key=True)
    Value = Column(Numeric)

def main(fileToRead):
    # Set up of the table in db and the file to import
    fileToRead = r'data.csv'
    tableToWriteTo = 'Timeseries_Values'        

    df = pd.read_csv(fileToRead, sep=';', decimal=',', parse_dates=['Date'], dayfirst=True)

    df.columns = ['Datetime', 'A']

    engine = create_engine('sqlite:///data.db')
    conn = engine.connect()

    metadata = sqlalchemy.schema.MetaData(bind=engine, reflect=True)
    table = sqlalchemy.Table(tableToWriteTo, metadata, autoload=True)

    # Open the session
    Session = sessionmaker(bind=engine)
    session = Session()

    conn.execute(table.insert(), listToWrite)

    session.commit()

    session.close()

因此,这适用于单个组合(“ Datetime and A”),但如何自动添加所有其他组合?

提前很多

1 个答案:

答案 0 :(得分:0)

这是部分答案,但看来问题的症结在于您需要melt数据框:

df

                         A      B      C      D      E
Datetime                                              
2014-01-01 00:00:00  15,15  15,15  32,43  15,15  33,27
2014-01-01 01:00:00  12,96  12,96  32,49  12,96  30,07
2014-01-01 02:00:00  12,09  12,09  28,43  12,09  23,01
2014-01-01 03:00:00   11,7   11,7  27,63   11,7  11,04
2014-01-01 04:00:00  11,66  11,66  25,99  11,66   9,09

重置并融化:

df1 = df.reset_index().melt('Datetime', var_name='ID', value_name='Value' )

              Datetime ID  Value
0  2014-01-01 00:00:00  A  15,15
1  2014-01-01 01:00:00  A  12,96
2  2014-01-01 02:00:00  A  12,09
3  2014-01-01 03:00:00  A   11,7
4  2014-01-01 04:00:00  A  11,66
5  2014-01-01 00:00:00  B  15,15
6  2014-01-01 01:00:00  B  12,96
7  2014-01-01 02:00:00  B  12,09
8  2014-01-01 03:00:00  B   11,7
9  2014-01-01 04:00:00  B  11,66
10 2014-01-01 00:00:00  C  32,43
11 2014-01-01 01:00:00  C  32,49
12 2014-01-01 02:00:00  C  28,43
13 2014-01-01 03:00:00  C  27,63
14 2014-01-01 04:00:00  C  25,99
15 2014-01-01 00:00:00  D  15,15
16 2014-01-01 01:00:00  D  12,96
17 2014-01-01 02:00:00  D  12,09
18 2014-01-01 03:00:00  D   11,7
19 2014-01-01 04:00:00  D  11,66
20 2014-01-01 00:00:00  E  33,27
21 2014-01-01 01:00:00  E  30,07
22 2014-01-01 02:00:00  E  23,01
23 2014-01-01 03:00:00  E  11,04
24 2014-01-01 04:00:00  E   9,09