我正在尝试从一个csv文件创建一个sqlite数据库。经过一些搜索后,似乎可以使用pandas df。我尝试按照一些教程和文档进行操作,但无法弄清楚该错误。这是我的代码:
# Import libraries
import pandas, csv, sqlite3
# Create sqlite database and cursor
conn = sqlite3.connect('test.db')
c = conn.cursor()
# Create the table of pitches
c.execute("""CREATE TABLE IF NOT EXISTS pitches (
pitch_type text,
game_date text,
release_speed real
)""")
conn.commit()
df = pandas.read_csv('test2.csv')
df.to_sql('pitches', conn, if_exists='append', index=False)
conn.close()
运行此代码时,出现以下错误:
sqlite3.OperationalError: table pitches has no column named SL
SL是csv文件第一行中的第一个值。我不知道为什么将csv值视为列名,除非它认为csv的第一行应该是标题,并试图将其与表中的列名匹配?我也不认为是这样,因为我尝试将第一个值更改为实际的列名并遇到相同的错误。
编辑:
当我在csv中有标题时,数据框如下所示:
pitch_type game_date release_speed
0 SL 8/31/2017 81.9
1 SL 8/31/2017 84.1
2 SL 8/31/2017 81.9
... ... ... ...
2919 SL 8/1/2017 82.3
2920 CU 8/1/2017 78.7
[2921 rows x 3 columns]
,我收到以下错误消息:
sqlite3.OperationalError: table pitches has no column named game_date
当我将标题从csv文件中取出时
SL 8/31/2017 81.9
0 SL 8/31/2017 84.1
1 SL 8/31/2017 81.9
2 SL 8/31/2017 84.1
... .. ... ...
2918 SL 8/1/2017 82.3
2919 CU 8/1/2017 78.7
[2920 rows x 3 columns]
,我收到以下错误消息:
sqlite3.OperationalError: table pitches has no column named SL
编辑#2:
我尝试按照this answer,使用以下代码完全从代码中删除表创建:
# Import libraries
import pandas, csv, sqlite3
# Create sqlite database and cursor
conn = sqlite3.connect('test.db')
c = conn.cursor()
df = pandas.read_csv('test2.csv')
df.to_sql('pitches', conn, if_exists='append', index=False)
conn.close()
并仍然获得
sqlite3.OperationalError: table pitches has no column named SL
错误
编辑#3:
我将表创建代码更改为以下代码:
# Create the table of pitches
dropTable = 'DROP TABLE pitches'
c.execute(dropTable)
createTable = "CREATE TABLE IF NOT EXISTS pitches(pitch_type text, game_date text, release_speed real)"
c.execute(createTable)
,现在可以使用。不确定到底发生了什么变化,因为在我看来,它基本上是相同的,但是它可以工作。
答案 0 :(得分:1)
如果您尝试从csv文件创建表,则可以运行sqlite3并执行以下操作:
sqlite> .mode csv
sqlite> .import c:/path/to/file/myfile.csv myTableName
答案 1 :(得分:1)
检查您的列名。我能够成功复制您的代码而不会出错。 names
变量从sqlite
表中获取所有列名,您可以将它们与带有df.columns
的数据帧标题进行比较。
# Import libraries
import pandas as pd, csv, sqlite3
# Create sqlite database and cursor
conn = sqlite3.connect('test.db')
c = conn.cursor()
# Create the table of pitches
c.execute("""CREATE TABLE IF NOT EXISTS pitches (
pitch_type text,
game_date text,
release_speed real
)""")
conn.commit()
test = conn.execute('SELECT * from pitches')
names = [description[0] for description in test.description]
print(names)
df = pd.DataFrame([['SL','8/31/2017','81.9']],columns = ['pitch_type','game_date','release_speed'])
df.to_sql('pitches', conn, if_exists='append', index=False)
conn.execute('SELECT * from pitches').fetchall()
>> [('SL', '8/31/2017', 81.9), ('SL', '8/31/2017', 81.9)]
我猜测您的列标题中可能会有一些空格。
答案 2 :(得分:0)
从熊猫read_csv文档中可以看到:
header : int or list of ints, default 'infer'
Row number(s) to use as the column names, and the start of the
data. Default behavior is to infer the column names: if no names
are passed the behavior is identical to ``header=0`` and column
names are inferred from the first line of the file, if column
names are passed explicitly then the behavior is identical to
``header=None``. Explicitly pass ``header=0`` to be able to
replace existing names. The header can be a list of integers that
specify row locations for a multi-index on the columns
e.g. [0,1,3]. Intervening rows that are not specified will be
skipped (e.g. 2 in this example is skipped). Note that this
parameter ignores commented lines and empty lines if
``skip_blank_lines=True``, so header=0 denotes the first line of
data rather than the first line of the file.
这意味着read_csv使用第一行作为标题名称。