我试图将多个csv文件导入sqlite数据库到多个表中(在python3中使用jupyter notebook)。每个文件的名称将是表的名称。我已经定义了一个函数来将编码转换为utf8,如下所示:
import sqlite3
import glob
import csv
import sys
def convert_to_utf8(dirname):
for filename in glob.glob(os.path.join(dirname, '*.csv')):
ifp = open(filename, "rt", encoding='cp1252')
input_data = ifp.read()
ifp.close()
ofp = open(filename + ".fix", "wt", encoding='utf-8')
for c in input_data:
if c != '\0':
ofp.write(c)
ofp.close()
return
所有文件都在同一个文件夹中。 staging_dir_name_1是文件所在的位置。我有以下代码将csv文件转换为表格,一些代码来自StackFlow中的类似问题:
convert_to_utf8(staging_dir_name_1)
conn = sqlite3.connect("medicare_hospital_compare_1.db")
c = conn.cursor()
for filename in glob.glob(os.path.join(staging_dir_name_1, '*.csv')):
with open(filename, "rb") as f:
data = csv.DictReader(f)
cols = data.fieldnames
tablename = os.path.splitext(os.path.basename(filename))[0]
sql_str = "drop table if exists %s" % tablename
c.execute(sql_str)
sql_str = "create table if not exists %s (%s)" % (tablename, ','.join(["%s text" % col for col in cols]))
c.execute(sql_str)
sql_str = "insert into %s values (%s)" % (tablename, ','.join(["?" for col in cols]))
c.executemany(sql_str, (list(map(row.get, cols)) for row in data))
conn.commit()
但是当我运行这个时我得到了这个错误
> Error Traceback (most recent call
> last) <ipython-input-29-be7c1f43e4c5> in <module>()
> 2 with open(filename, "rb") as f:
> 3 data = csv.DictReader(f)
> ----> 4 cols = data.fieldnames
> 5 tablename = os.path.splitext(os.path.basename(filename))[0]
> 6
>
> C:\Users\dupin\Anaconda3\lib\csv.py in fieldnames(self)
> 96 if self._fieldnames is None:
> 97 try:
> ---> 98 self._fieldnames = next(self.reader)
> 99 except StopIteration:
> 100 pass
>
> Error: iterator should return strings, not bytes (did you open the
> file in text mode?)
有人可以帮我解决这个问题吗?我已经考虑了一段时间,但仍然无法弄清楚如何解决这个问题。
**===UPDATE===**
现在我已经将'rb'更改为'rt',我得到一个新的错误全NULL值,我认为第一个函数已经删除了所有空值
Error Traceback (most recent call last)
<ipython-input-77-68d56c0b4cf2> in <module>()
3
4 data = csv.DictReader(f)
----> 5 cols = data.fieldnames
6 table = os.path.splitext(os.path.basename(filename))[0]
7
C:\Users\dupin\Anaconda3\lib\csv.py in fieldnames(self)
96 if self._fieldnames is None:
97 try:
---> 98 self._fieldnames = next(self.reader)
99 except StopIteration:
100 pass
Error: line contains NULL byte