我有以下代码:
import pandas as pd
import time
import numpy as np
import os
dir_path = os.path.dirname(os.path.abspath(__file__))
db_path = 'sqlite:///' + dir_path + '/example.db'
df = pd.DataFrame(np.random.normal(0, 1, (500000, 10)), columns=['col%d' % i for i in range(10)])
writes = []
reads = []
for i in range(1000):
if os.path.exists(dir_path + '/example.db'):
print 'db removed from disk'
os.remove(dir_path + '/example.db')
t0 = time.time()
df.to_sql('example', db_path)
t1 = time.time()
print 'time to write:', t1 - t0
df = pd.read_sql('SELECT * FROM example', db_path)
t2 = time.time()
print "time to read:", t2 - t1
writes.append(t1 - t0)
reads.append(t2 - t1)
print 'Average writing time', np.mean(writes), ' n times=', len(writes)
print 'Average reading time', np.mean(reads), ' n times=', len(reads)
它崩溃为:
(mlcne_venv) [lpuggini@machinelearn-1 Desktop]$ python read_write_time.py
db removed from disk
time to write: 16.2156860828
time to read: 3.31190609932
Average writing time 16.2156860828 n times= 1
Average reading time 3.31190609932 n times= 1
db removed from disk
time to write: 15.6938099861
time to read: 3.74449300766
Average writing time 15.9547480345 n times= 2
Average reading time 3.52819955349 n times= 2
db removed from disk
Traceback (most recent call last):
File "read_write_time.py", line 23, in <module>
df.to_sql('example', db_path)
File "/home/lpuggini/VirtualEnvs/mlcne_venv/lib/python2.7/site-packages/pandas/core/generic.py", line 1201, in to_sql
chunksize=chunksize, dtype=dtype)
File "/home/lpuggini/VirtualEnvs/mlcne_venv/lib/python2.7/site-packages/pandas/io/sql.py", line 470, in to_sql
chunksize=chunksize, dtype=dtype)
File "/home/lpuggini/VirtualEnvs/mlcne_venv/lib/python2.7/site-packages/pandas/io/sql.py", line 1148, in to_sql
table.insert(chunksize)
File "/home/lpuggini/VirtualEnvs/mlcne_venv/lib/python2.7/site-packages/pandas/io/sql.py", line 642, in insert
keys, data_list = self.insert_data()
File "/home/lpuggini/VirtualEnvs/mlcne_venv/lib/python2.7/site-packages/pandas/io/sql.py", line 609, in insert_data
"duplicate name in index/columns: {0}".format(err))
ValueError: duplicate name in index/columns: cannot insert level_0, already exists
(mlcne_venv) [lpuggini@machinelearn-1 Desktop]$
可能是熊猫虫吗?
答案 0 :(得分:2)
每次下一次迭代都会为您提供一个新的索引列。如果您从数据库中读取后添加了输出命令print df.head()
,则会得到以下信息:
1st iteration:
index col0 col1 col2 col3 col4 col5 \
0 0 1.562089 2.297787 -0.490346 0.935589 -0.151018 -0.920552
1 1 -0.868018 1.212738 1.386166 -0.422851 -0.243167 -0.362809
2 2 -0.420472 0.069789 2.706735 0.814905 1.095517 0.570413
3 3 -0.341983 -0.727765 0.549108 0.785910 0.687795 2.158346
4 4 -0.476875 0.525177 -0.532743 0.382937 -0.824136 1.276228
col6 col7 col8 col9
0 -1.179577 1.746765 0.468008 0.751347
1 0.810910 -0.386281 -0.512447 -1.124378
2 2.764994 2.735384 -0.449755 -0.393979
3 -0.895257 -0.396221 -1.149718 -0.382153
4 0.177278 -0.212614 0.185388 0.688391
2nd iteration:
level_0 index col0 col1 col2 col3 col4 col5 \
0 0 0 1.562089 2.297787 -0.490346 0.935589 -0.151018 -0.920552
1 1 1 -0.868018 1.212738 1.386166 -0.422851 -0.243167 -0.362809
2 2 2 -0.420472 0.069789 2.706735 0.814905 1.095517 0.570413
3 3 3 -0.341983 -0.727765 0.549108 0.785910 0.687795 2.158346
4 4 4 -0.476875 0.525177 -0.532743 0.382937 -0.824136 1.276228
col6 col7 col8 col9
0 -1.179577 1.746765 0.468008 0.751347
1 0.810910 -0.386281 -0.512447 -1.124378
2 2.764994 2.735384 -0.449755 -0.393979
3 -0.895257 -0.396221 -1.149718 -0.382153
4 0.177278 -0.212614 0.185388 0.688391
然后崩溃。
只需将写入命令更改为
df.to_sql('example', db_path, index=False)
因此它不会每次都创建一个新索引。