如何使用python3将多个csv文件转换为sqlite中的多个表?

时间:2017-10-27 00:55:26

标签: python-3.x sqlite csv

我试图将多个csv文件导入sqlite数据库到多个表中(在python3中使用jupyter notebook)。每个文件的名称将是表的名称。我已经定义了一个函数来将编码转换为utf8,如下所示:

import sqlite3
import glob
import csv
import sys
def convert_to_utf8(dirname):
    for filename in glob.glob(os.path.join(dirname, '*.csv')):
        ifp = open(filename, "rt", encoding='cp1252')
        input_data = ifp.read()
        ifp.close()
        ofp = open(filename + ".fix", "wt", encoding='utf-8')
        for c in input_data:
            if c != '\0':
                ofp.write(c)
        ofp.close()
    return

所有文件都在同一个文件夹中。 staging_dir_name_1是文件所在的位置。我有以下代码将csv文件转换为表格,一些代码来自StackFlow中的类似问题:

convert_to_utf8(staging_dir_name_1)
conn = sqlite3.connect("medicare_hospital_compare_1.db")
c = conn.cursor()
for filename in glob.glob(os.path.join(staging_dir_name_1, '*.csv')):
    with open(filename, "rb") as f:
        data = csv.DictReader(f)
        cols = data.fieldnames
        tablename = os.path.splitext(os.path.basename(filename))[0]

        sql_str = "drop table if exists %s" % tablename
        c.execute(sql_str)

        sql_str = "create table if not exists %s (%s)" % (tablename, ','.join(["%s text" % col for col in cols]))
        c.execute(sql_str)

        sql_str = "insert into %s values (%s)" % (tablename, ','.join(["?" for col in cols]))
        c.executemany(sql_str, (list(map(row.get, cols)) for row in data))

conn.commit()        

但是当我运行这个时我得到了这个错误

> Error                                     Traceback (most recent call
> last) <ipython-input-29-be7c1f43e4c5> in <module>()
>       2     with open(filename, "rb") as f:
>       3         data = csv.DictReader(f)
> ----> 4         cols = data.fieldnames
>       5         tablename = os.path.splitext(os.path.basename(filename))[0]
>       6 
> 
> C:\Users\dupin\Anaconda3\lib\csv.py in fieldnames(self)
>      96         if self._fieldnames is None:
>      97             try:
> ---> 98                 self._fieldnames = next(self.reader)
>      99             except StopIteration:
>     100                 pass
> 
> Error: iterator should return strings, not bytes (did you open the
> file in text mode?)

有人可以帮我解决这个问题吗?我已经考虑了一段时间,但仍然无法弄清楚如何解决这个问题。

                             **===UPDATE===**

现在我已经将'rb'更改为'rt',我得到一个新的错误全NULL值,我认为第一个函数已经删除了所有空值

Error                                     Traceback (most recent call last)
<ipython-input-77-68d56c0b4cf2> in <module>()
      3 
      4         data = csv.DictReader(f)
----> 5         cols = data.fieldnames
      6         table = os.path.splitext(os.path.basename(filename))[0]
      7 

C:\Users\dupin\Anaconda3\lib\csv.py in fieldnames(self)
     96         if self._fieldnames is None:
     97             try:
---> 98                 self._fieldnames = next(self.reader)
     99             except StopIteration:
    100                 pass

Error: line contains NULL byte

0 个答案:

没有答案