使用python熊猫将CSV表导入SQL Server

时间:2020-07-20 22:47:48

标签: python sql-server pandas csv

这是我写过的最大的帮助请求。我目前的流程是尝试缩小范围,直到获得可能产生错误的最小代码。

我已经从https://www.ffiec.gov/npw/FinancialReport/DataDownload下载了attribute_active csv表。

我正在尝试将其加载到SQL Server中的表中。它不会直接导入(最终需要自动完成,因此无论如何我都需要弄清楚python)。但它不会直接导入。

有趣的注释:我可以将此文件直接导入到MS Access数据库中。它产生一个错误表。如果我告诉我双引号字符用于字符串,那么它将正确导入。

所以我用它来生成错误表,例如,它告诉我第53行是一个问题(如果包含标题,则实际上是54行)

访问导入错误的前几行如下:

Error   Field   Row
Type Conversion Failure ORG_TYPE_CD 53
Type Conversion Failure STATE_CD    53
Type Conversion Failure ID_FDIC_CERT    53
Type Conversion Failure SLHC_TYPE_IND   53
Type Conversion Failure CNTRY_INC_CD    53
Type Conversion Failure ORG_TYPE_CD 56
Type Conversion Failure STATE_CD    56
Type Conversion Failure ID_FDIC_CERT    56
Type Conversion Failure SLHC_TYPE_IND   56
Type Conversion Failure CNTRY_INC_CD    56
Type Conversion Failure ORG_TYPE_CD 523
Type Conversion Failure STATE_CD    523
Type Conversion Failure ID_FDIC_CERT    523
Type Conversion Failure SLHC_TYPE_IND   523
Type Conversion Failure CNTRY_INC_CD    523
Type Conversion Failure ID_FDIC_CERT    610
Type Conversion Failure SLHC_TYPE_IND   610
Type Conversion Failure CNTRY_INC_CD    610
Type Conversion Failure ORG_TYPE_CD 714
Type Conversion Failure STATE_CD    714
Type Conversion Failure ID_FDIC_CERT    714
Type Conversion Failure SLHC_TYPE_IND   714
Type Conversion Failure CNTRY_INC_CD    714
Type Conversion Failure ORG_TYPE_CD 759
Type Conversion Failure STATE_CD    759
Type Conversion Failure ID_FDIC_CERT    759
Type Conversion Failure SLHC_TYPE_IND   759
Type Conversion Failure CNTRY_INC_CD    759
Type Conversion Failure ORG_TYPE_CD 796
Type Conversion Failure STATE_CD    796
Type Conversion Failure ID_FDIC_CERT    796
Type Conversion Failure SLHC_TYPE_IND   796
Type Conversion Failure CNTRY_INC_CD    796

问题是我查看这些行和这些字段的原始文本,将它们与相邻行中的相同字段区分开来并不奇怪。但是,我确实认为,当我明确告知访问权限对字符串使用双引号时,不会出现导入错误这一事实很重要-但我对熊猫也做同样的事情!

我正在使用以下代码尝试将其读入SQL Server:

def ingest_npw_attributes_active():
    filename = 'CSV_ATTRIBUTES_ACTIVE.CSV'
    full_filename = os.path.join(raw_data_dir, filename)
    print(full_filename)
    dt = {'AUTH_REG_DIST_FRS': int,
          'STREET_LINE2': "string",
          'ID_THRIFT': "string",
          'ID_TAX': int
          }

    init_connection()
    df = pd.read_csv(full_filename, dtype=dt, header=0, quotechar='"')

    place_holder = "?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,"
    place_holder += "?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?"
    sql_str = "exec dbo.save_attributes_active " + place_holder

    # build assignment string for sql
    values = "("
    for col_name in df.columns:
        values += "row['" + col_name + "'], "
    values = values[:-1] + ")"

    for index, row in df.iterrows():
      if index < 2:
        print(row)
        values = (row['#ID_RSSD'], row['D_DT_START'], row['D_DT_END'], row['BHC_IND'], row['BROAD_REG_CD'],
                  row['CHTR_AUTH_CD'], row['CHTR_TYPE_CD'], row['FBO_4C9_IND'], row['FHC_IND'], row['FUNC_REG'],
                  row['INSUR_PRI_CD'], row['MBR_FHLBS_IND'], row['MBR_FRS_IND'], row['SEC_RPTG_STATUS'],
                  row['EST_TYPE_CD'], row['BANK_CNT'], row['BNK_TYPE_ANALYS_CD'], row['D_DT_EXIST_CMNC'],
                  row['D_DT_EXIST_TERM'], row['FISC_YREND_MMDD'], row['D_DT_INSUR'], row['D_DT_OPEN'],
                  row['FNCL_SUB_HOLDER'], row['FNCL_SUB_IND'], row['IBA_GRNDFTHR_IND'], row['IBF_IND'],
                  row['ID_RSSD_HD_OFF'], row['MJR_OWN_MNRTY'], row['NM_LGL'], row['NM_SHORT'], row['NM_SRCH_CD'],
                  row['ORG_TYPE_CD'], row['REASON_TERM_CD'], row['CNSRVTR_CD'], row['ENTITY_TYPE'],
                  row['AUTH_REG_DIST_FRS'], row['ACT_PRIM_CD'], row['CITY'], row['CNTRY_NM'], row['ID_CUSIP'],
                  row['STATE_ABBR_NM'], row['PLACE_CD'], row['STATE_CD'], row['STATE_HOME_CD'], row['STREET_LINE1'],
                  row['STREET_LINE2'], row['ZIP_CD'], row['ID_THRIFT'], row['ID_THRIFT_HC'], row['DOMESTIC_IND'],
                  row['ID_ABA_PRIM'], row['ID_FDIC_CERT'], row['ID_NCUA'], row['COUNTY_CD'], row['DIST_FRS'],
                  row['ID_OCC'], row['CNTRY_CD'], row['DT_END'], row['DT_EXIST_CMNC'], row['DT_EXIST_TERM'],
                  row['DT_INSUR'], row['DT_OPEN'], row['DT_START'], row['ID_TAX'], row['PROV_REGION'], row['URL'],
                  row['SLHC_IND'], row['SLHC_TYPE_IND'], row['PRIM_FED_REG'], row['STATE_INC_CD'], row['CNTRY_INC_CD'],
                  row['STATE_INC_ABBR_NM'], row['CNTRY_INC_NM'], row['ID_LEI'], row['IHC_IND'])

    # print(values)
    return_key = cursor.execute(sql_str, values).fetchval()
    print('return_key =', return_key)

    conn.close()
    return

我的SQL Server表定义如下:

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[npwAttributesActive]
(
    [id] [bigint] IDENTITY(1,1) NOT NULL,
    [ID_RSSD] [int] NOT NULL,
    [D_DT_START] [nvarchar](50) NULL,
    [D_DT_END] [nvarchar](50) NULL,
    [BHC_IND] [int] NULL,
    [BROAD_REG_CD] [int] NULL,
    [CHTR_AUTH_CD] [int] NULL,
    [CHTR_TYPE_CD] [int] NULL,
    [FBO_4C9_IND] [int] NULL,
    [FHC_IND] [int] NULL,
    [FUNC_REG] [int] NULL,
    [INSUR_PRI_CD] [int] NULL,
    [MBR_FHLBS_IND] [int] NULL,
    [MBR_FRS_IND] [int] NULL,
    [SEC_RPTG_STATUS] [int] NULL,
    [EST_TYPE_CD] [int] NULL,
    [BANK_CNT] [nvarchar](1) NULL,
    [BNK_TYPE_ANALYS_CD] [int] NULL,
    [D_DT_EXIST_CMNC] [nvarchar](50) NULL,
    [D_DT_EXIST_TERM] [nvarchar](50) NULL,
    [FISC_YREND_MMDD] [int] NULL,
    [D_DT_INSUR] [nvarchar](50) NULL,
    [D_DT_OPEN] [nvarchar](50) NULL,
    [FNCL_SUB_HOLDER] [int] NULL,
    [FNCL_SUB_IND] [int] NULL,
    [IBA_GRNDFTHR_IND] [int] NULL,
    [IBF_IND] [int] NULL,
    [ID_RSSD_HD_OFF] [int] NULL,
    [MJR_OWN_MNRTY] [int] NULL,
    [NM_LGL] [nvarchar](150) NULL,
    [NM_SHORT] [nvarchar](50) NULL,
    [NM_SRCH_CD] [int] NULL,
    [ORG_TYPE_CD] [int] NULL,
    [REASON_TERM_CD] [int] NULL,
    [CNSRVTR_CD] [int] NULL,
    [ENTITY_TYPE] [nvarchar](50) NULL,
    [AUTH_REG_DIST_FRS] [int] NULL,
    [ACT_PRIM_CD] [nvarchar](50) NULL,
    [CITY] [nvarchar](50) NULL,
    [CNTRY_NM] [nvarchar](50) NULL,
    [ID_CUSIP] [nvarchar](50) NULL,
    [STATE_ABBR_NM] [nvarchar](50) NULL,
    [PLACE_CD] [int] NULL,
    [STATE_CD] [int] NULL,
    [STATE_HOME_CD] [int] NULL,
    [STREET_LINE1] [nvarchar](50) NULL,
    [STREET_LINE2] [nvarchar](50) NULL,
    [ZIP_CD] [nvarchar](50) NULL,
    [ID_THRIFT] [int] NULL,
    [ID_THRIFT_HC] [nvarchar](50) NULL,
    [DOMESTIC_IND] [nvarchar](50) NULL,
    [ID_ABA_PRIM] [int] NULL,
    [ID_FDIC_CERT] [int] NULL,
    [ID_NCUA] [int] NULL,
    [COUNTY_CD] [int] NULL,
    [DIST_FRS] [int] NULL,
    [ID_OCC] [int] NULL,
    [CNTRY_CD] [int] NULL,
    [DT_END] [int] NULL,
    [DT_EXIST_CMNC] [int] NULL,
    [DT_EXIST_TERM] [int] NULL,
    [DT_INSUR] [int] NULL,
    [DT_OPEN] [int] NULL,
    [DT_START] [int] NULL,
    [ID_TAX] [int] NULL,
    [PROV_REGION] [nvarchar](50) NULL,
    [URL] [nvarchar](50) NULL,
    [SLHC_IND] [int] NULL,
    [SLHC_TYPE_IND] [int] NULL,
    [PRIM_FED_REG] [nvarchar](50) NULL,
    [STATE_INC_CD] [int] NULL,
    [CNTRY_INC_CD] [int] NULL,
    [STATE_INC_ABBR_NM] [nvarchar](50) NULL,
    [CNTRY_INC_NM] [nvarchar](50) NULL,
    [ID_LEI] [nvarchar](50) NULL,
    [IHC_IND] [int] NULL
) ON [PRIMARY]

存储过程是

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[save_attributes_active]
    @ID_RSSD       int,
    @D_DT_START    nvarchar(50),
    @D_DT_END      nvarchar(50),
    @BHC_IND       int,
    @BROAD_REG_CD  int,
    @CHTR_AUTH_CD  int,
    @CHTR_TYPE_CD  int,
    @FBO_4C9_IND   int,
    @FHC_IND       int,
    @FUNC_REG      int,
    @INSUR_PRI_CD  int,
    @MBR_FHLBS_IND      int,
    @MBR_FRS_IND        int,
    @SEC_RPTG_STATUS    int,
    @EST_TYPE_CD        int,
    @BANK_CNT           nvarchar(1),
    @BNK_TYPE_ANALYS_CD int,
    @D_DT_EXIST_CMNC    nvarchar(50),
    @D_DT_EXIST_TERM    nvarchar(50),
    @FISC_YREND_MMDD    int,
    @D_DT_INSUR         nvarchar(50),
    @D_DT_OPEN          nvarchar(50),
    @FNCL_SUB_HOLDER    int,
    @FNCL_SUB_IND       int,
    @IBA_GRNDFTHR_IND   int,
    @IBF_IND            int,
    @ID_RSSD_HD_OFF     int,
    @MJR_OWN_MNRTY      int,
    @NM_LGL             nvarchar(150),
    @NM_SHORT           nvarchar(50),
    @NM_SRCH_CD         int,
    @ORG_TYPE_CD        int,
    @REASON_TERM_CD     int,
    @CNSRVTR_CD         int,
    @ENTITY_TYPE        nvarchar(50),
    @AUTH_REG_DIST_FRS  int,
    @ACT_PRIM_CD        nvarchar(50),
    @CITY               nvarchar(50),
    @CNTRY_NM           nvarchar(50),
    @ID_CUSIP           nvarchar(50),
    @STATE_ABBR_NM      nvarchar(50),
    @PLACE_CD           int,
    @STATE_CD           int,
    @STATE_HOME_CD      int,
    @STREET_LINE1       nvarchar(50),
    @STREET_LINE2       nvarchar(50),
    @ZIP_CD             nvarchar(50),
    @ID_THRIFT          int,
    @ID_THRIFT_HC       nvarchar(50),
    @DOMESTIC_IND       nvarchar(50),
    @ID_ABA_PRIM        int,
    @ID_FDIC_CERT       int,
    @ID_NCUA            int,
    @COUNTY_CD          int,
    @DIST_FRS           int,
    @ID_OCC             int,
    @CNTRY_CD           int,
    @DT_END             int,
    @DT_EXIST_CMNC      int,
    @DT_EXIST_TERM      int,
    @DT_INSUR           int,
    @DT_OPEN            int,
    @DT_START           int,
    @ID_TAX             int,
    @PROV_REGION        nvarchar(50),
    @URL                nvarchar(50),
    @SLHC_IND           int,
    @SLHC_TYPE_IND      int,
    @PRIM_FED_REG       nvarchar(50),
    @STATE_INC_CD       int,
    @CNTRY_INC_CD       int,
    @STATE_INC_ABBR_NM  nvarchar(50),
    @CNTRY_INC_NM       nvarchar(50),
    @ID_LEI             nvarchar(50),
    @IHC_IND            int
AS
BEGIN
    SET NOCOUNT ON;

    INSERT INTO dbo.npwAttributesActive ([ID_RSSD], [D_DT_START], [D_DT_END], [BHC_IND], [BROAD_REG_CD], [CHTR_AUTH_CD], [CHTR_TYPE_CD], [FBO_4C9_IND], [FHC_IND], [FUNC_REG], [INSUR_PRI_CD],
         [MBR_FHLBS_IND], [MBR_FRS_IND], [SEC_RPTG_STATUS], [EST_TYPE_CD], [BANK_CNT], [BNK_TYPE_ANALYS_CD], [D_DT_EXIST_CMNC], [D_DT_EXIST_TERM], [FISC_YREND_MMDD], [D_DT_INSUR], [D_DT_OPEN], 
         [FNCL_SUB_HOLDER], [FNCL_SUB_IND], [IBA_GRNDFTHR_IND], [IBF_IND], [ID_RSSD_HD_OFF], [MJR_OWN_MNRTY], [NM_LGL], [NM_SHORT], [NM_SRCH_CD], [ORG_TYPE_CD], [REASON_TERM_CD], [CNSRVTR_CD], 
         [ENTITY_TYPE], [AUTH_REG_DIST_FRS], [ACT_PRIM_CD], [CITY], [CNTRY_NM], [ID_CUSIP], [STATE_ABBR_NM], [PLACE_CD], [STATE_CD], [STATE_HOME_CD], [STREET_LINE1], [STREET_LINE2], [ZIP_CD], 
         [ID_THRIFT], [ID_THRIFT_HC], [DOMESTIC_IND], [ID_ABA_PRIM], [ID_FDIC_CERT], [ID_NCUA], [COUNTY_CD], [DIST_FRS], [ID_OCC], [CNTRY_CD], [DT_END], [DT_EXIST_CMNC], [DT_EXIST_TERM], [DT_INSUR],
         [DT_OPEN], [DT_START], [ID_TAX], [PROV_REGION], [URL], [SLHC_IND], [SLHC_TYPE_IND], [PRIM_FED_REG], [STATE_INC_CD], [CNTRY_INC_CD],
         [STATE_INC_ABBR_NM], [CNTRY_INC_NM], [ID_LEI], [IHC_IND] ) 
    VALUES (@ID_RSSD, @D_DT_START, @D_DT_END, @BHC_IND, @BROAD_REG_CD, @CHTR_AUTH_CD, @CHTR_TYPE_CD, @FBO_4C9_IND, @FHC_IND, @FUNC_REG, @INSUR_PRI_CD,
        @MBR_FHLBS_IND, @MBR_FRS_IND, @SEC_RPTG_STATUS, @EST_TYPE_CD, @BANK_CNT, @BNK_TYPE_ANALYS_CD, @D_DT_EXIST_CMNC, @D_DT_EXIST_TERM, @FISC_YREND_MMDD,
        @D_DT_INSUR, @D_DT_OPEN, @FNCL_SUB_HOLDER, @FNCL_SUB_IND, @IBA_GRNDFTHR_IND, @IBF_IND, @ID_RSSD_HD_OFF, @MJR_OWN_MNRTY, @NM_LGL, @NM_SHORT, @NM_SRCH_CD, 
        @ORG_TYPE_CD, @REASON_TERM_CD, @CNSRVTR_CD, @ENTITY_TYPE, @AUTH_REG_DIST_FRS, @ACT_PRIM_CD, @CITY, @CNTRY_NM, @ID_CUSIP, @STATE_ABBR_NM, @PLACE_CD, @STATE_CD,
        @STATE_HOME_CD, @STREET_LINE1, @STREET_LINE2, @ZIP_CD, @ID_THRIFT, @ID_THRIFT_HC, @DOMESTIC_IND, @ID_ABA_PRIM, @ID_FDIC_CERT, @ID_NCUA, @COUNTY_CD, @DIST_FRS,
        @ID_OCC, @CNTRY_CD, @DT_END, @DT_EXIST_CMNC, @DT_EXIST_TERM, @DT_INSUR, @DT_OPEN, @DT_START, @ID_TAX, @PROV_REGION, @URL, @SLHC_IND, @SLHC_TYPE_IND,
        @PRIM_FED_REG, @STATE_INC_CD, @CNTRY_INC_CD, @STATE_INC_ABBR_NM, @CNTRY_INC_NM, @ID_LEI, @IHC_IND
    )

    SELECT 1
END

但是,出现以下错误:

Traceback (most recent call last):
  File "C:/Users/kgreen/Source/Repos/MapTools/ingest_source.py", line 142, in <module>
    main()
  File "C:/Users/kgreen/Source/Repos/MapTools/ingest_source.py", line 136, in main
    ingest_npw_data()
  File "C:/Users/kgreen/Source/Repos/MapTools/ingest_source.py", line 130, in ingest_npw_data
    ingest_npw_attributes_active()
  File "C:/Users/kgreen/Source/Repos/MapTools/ingest_source.py", line 122, in ingest_npw_attributes_active
    return_key = cursor.execute(sql_str, values).fetchval()
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]The incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Parameter 19 (""): The supplied value is not a valid instance of data type float. Check the source data for invalid values. An example of an invalid value is data of numeric type with scale greater than precision. (8023) (SQLExecDirectW)')

附录 CSV的前几行

#ID_RSSD,D_DT_START,D_DT_END,BHC_IND,BROAD_REG_CD,CHTR_AUTH_CD,CHTR_TYPE_CD,FBO_4C9_IND,FHC_IND,FUNC_REG,INSUR_PRI_CD,MBR_FHLBS_IND,MBR_FRS_IND,SEC_RPTG_STATUS,EST_TYPE_CD,BANK_CNT,BNK_TYPE_ANALYS_CD,D_DT_EXIST_CMNC,D_DT_EXIST_TERM,FISC_YREND_MMDD,D_DT_INSUR,D_DT_OPEN,FNCL_SUB_HOLDER,FNCL_SUB_IND,IBA_GRNDFTHR_IND,IBF_IND,ID_RSSD_HD_OFF,MJR_OWN_MNRTY,NM_LGL,NM_SHORT,NM_SRCH_CD,ORG_TYPE_CD,REASON_TERM_CD,CNSRVTR_CD,ENTITY_TYPE,AUTH_REG_DIST_FRS,ACT_PRIM_CD,CITY,CNTRY_NM,ID_CUSIP,STATE_ABBR_NM,PLACE_CD,STATE_CD,STATE_HOME_CD,STREET_LINE1,STREET_LINE2,ZIP_CD,ID_THRIFT,ID_THRIFT_HC,DOMESTIC_IND,ID_ABA_PRIM,ID_FDIC_CERT,ID_NCUA,COUNTY_CD,DIST_FRS,ID_OCC,CNTRY_CD,DT_END,DT_EXIST_CMNC,DT_EXIST_TERM,DT_INSUR,DT_OPEN,DT_START,ID_TAX,PROV_REGION,URL,SLHC_IND,SLHC_TYPE_IND,PRIM_FED_REG,STATE_INC_CD,CNTRY_INC_CD,STATE_INC_ABBR_NM,CNTRY_INC_NM,ID_LEI,IHC_IND
37,"04/15/2009 00:00:00","12/31/9999 00:00:00",0,1,2,200,0,0,0,7,1,0,0,1,,0,,"12/31/9999 00:00:00",0,"01/01/1934 00:00:00","09/01/1904 00:00:00",0,0,0,0,0,0,"BANK OF HANCOCK COUNTY                                                                                                  ","BANK OF HANCOCK CTY           ",1072861144,1,0,0,"NMB",6,"52211 ","SPARTA","UNITED STATES                           ","0","GA",72584,13,0,"12855 BROAD STREET","0","31087    ",16553,"0","Y",61107146,10057,0,141,6,0,1007,99991231,0,99991231,19340101,19040901,20090415,0,"0","0",0,0,"FDIC",0,0,"0 ","","0",0
73,"12/31/2008 00:00:00","12/31/9999 00:00:00",0,2,1,330,0,0,0,3,0,0,0,1,,0,,"12/31/9999 00:00:00",0,"01/04/1971 00:00:00","01/01/1936 00:00:00",0,0,0,0,0,0,"UTILITY EMPLOYEES FEDERAL CREDIT UNION                                                                                  ","UTILITY EMPL FCU              ",788018087,6,0,0,"FCU",12,"52213 ","HOQUIAM","UNITED STATES                           ","0","WA",0,53,0,"220 MYRTLE STREET","0","98550    ",0,"0","Y",325179988,0,1851,27,12,0,1007,99991231,0,99991231,19710104,19360101,20081231,910591861,"0","0",0,0,"NCUA",0,0,"0 ","","0",0
242,"01/01/2012 00:00:00","12/31/9999 00:00:00",0,1,2,200,0,0,0,7,1,1,0,1,,0,,"12/31/9999 00:00:00",0,"01/01/1934 00:00:00","01/01/1922 00:00:00",0,0,0,0,0,0,"FIRST COMMUNITY BANK XENIA-FLORA                                                                                        ","FIRST CMNTY BK XENIA FLORA    ",574907456,1,0,0,"SMB",8,"52211 ","XENIA","UNITED STATES                           ","0","IL",83739,17,0,"260 FRONT STREET","0","62899    ",0,"0","Y",81220537,3850,0,25,8,0,1007,99991231,0,99991231,19340101,19220101,20120101,370274860,"0","WWW.FCBXENIAFLORA.COM/INDEX.HTML",0,0,"FRS",17,0,"IL","","0",0
279,"01/01/2012 00:00:00","12/31/9999 00:00:00",0,1,2,300,0,0,0,7,1,0,0,1,,0,,"12/31/9999 00:00:00",0,"01/01/1997 00:00:00","01/01/1934 00:00:00",0,0,0,0,0,0,"MINEOLA COMMUNITY BANK, SSB                                                                                             ","MINEOLA CMNTY BK SSB          ",98889854,6,0,0,"SSB",11,"52211 ","MINEOLA","UNITED STATES                           ","0","TX",48648,48,0,"215 W BROAD","0","75773    ",2523,"0","Y",311972526,28868,0,499,11,0,1007,99991231,0,99991231,19970101,19340101,20120101,750440734,"0","0",0,0,"FDIC",48,0,"TX","","0",0
354,"12/04/2019 00:00:00","12/31/9999 00:00:00",0,1,2,200,0,0,0,7,0,0,0,1,,0,"01/01/1901 00:00:00","12/31/9999 00:00:00",0,"03/21/1934 00:00:00","01/01/1901 00:00:00",0,0,0,0,0,0,"BISON STATE BANK                                                                                                        ","BISON ST BK                   ",715161421,1,0,0,"NMB",10,"52211 ","BISON","UNITED STATES                           ","0","KS",6950,20,0,"223 MAIN STREET","0","67520    ",0,"0","Y",101107475,14083,0,165,10,0,1007,99991231,19010101,99991231,19340321,19010101,20191204,0,"0","0",0,0,"FDIC",0,0,"0 ","","0",0

1 个答案:

答案 0 :(得分:1)

实际上没有看到完整的CSV数据集,您的问题实际上可能归结为在CSV单元中显示的空白。

CSV-> MS访问类型转换

没有指定的表,MS Access会根据读取的前几行来导入数据。在指定双引号之前,Access会将""读取为零空字符串,如果Access最初将相应字段定义为intdoubledate/time,则会出现问题。

熊猫-> SQL Server类型转换

同样,如果未指定dtype参数,则Pandas会根据前几行读取数据。将Pandas数据框迁移到SQL Server表时,可能出现空字符串问题。根据您的错误,您可能有CSV单元格带有"",并尝试迁移到非字符串列:

参数19(“”):提供的值不是数据类型float的有效实例。

默认情况下,read_csv将整数列中带有""的值转换为NaN,从而将整列呈现为float,不再int,这可以影响迁移。可能没有适合您的特定解决方案,但考虑了可能需要结合使用的一系列解决方案:

  • 使用na_filter中的read_csv选项,该选项不会用NaN替换空格,但会将带有空字符串的数字列转换为object,这可能会影响SQL Server导入。

    df = pd.read_csv(full_filename, dtype=dt, na_filter=False)
    

    注意quote='"'已经是默认设置,如果不替换header=0names可能是多余的。

  • 按常规读取,然后将所有列转换为对象,并用None替换空格,这可以正确转换为SQL中的NULLpyodbc可能会尝试转换为最终类型。

    df = pd.read_csv(full_filename, dtype=dt)
    
    df = df.astype(object).where(pd.notnull(df), None)
    

    或者,转储到所有varchar的临时表中,并运行UPDATEMERGE,并将其类型转换为最终表。这都可以在存储过程中处理。

  • 分别调整每个数字列以预测空白。也许将占位符值9999放在空白整数中,以便在迁移后清除。

    df['D_DT_EXIST_TERM'] = df['D_DT_EXIST_TERM'].replace('', None)
    
    df['FISC_YREND_MMDD'] = df['FISC_YREND_MMDD'].fillna(9999).astype('int')
    

最后,也许您可​​以完全避免使用最适合数据分析的笨重库pandas,并使用csv使用内置csv.reader模块进行迁移。这样可以更好地处理投射为None的空格(而不是Numpy的np.nan float实体)。许多在线博客,教程和问题都可以帮助使用此方法。为了更好地控制CSV列与SQL列的对齐方式,请查看csv.DictReader