Jupyter Notebook中“ to_sql”命令中的MemoryError

时间:2019-11-30 05:17:34

标签: python amazon-web-services amazon-ec2 jupyter-notebook amazon-sagemaker

我正在AWS Sage Maker上的Jupyter笔记本上工作。我已经对具有5000行的数据执行了文本处理。我想使用以下代码将此代码写入另一个SQL查询。

conn=sqlite3.connect('final_2.sqlite')
c=conn.cursor()
conn.text_factory=str
final.to_sql('Reviews',conn,schema=None,if_exists='replace')

它可以节省2.09 GB内存并停止运行。当我打开此文件时,其不视为文件。然后,我尝试写入.csv文件,但仍然存在相同的问题。 下载并打开csv时,出现以下错误。

Jupyter Notebook
current mode
File
Edit
View
Language
1
Error! Traceback (most recent call last):
2
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/tornado/web.py", line 1699, in _execute
3
    result = await result
4
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/tornado/gen.py", line 209, in wrapper
5
    yielded = next(result)
6
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/handlers.py", line 112, in get
7
    path=path, type=type, format=format, content=content,
8
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/filemanager.py", line 438, in get
9
    model = self._file_model(path, content=content, format=format)
10
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/filemanager.py", line 365, in _file_model
11
    content, format = self._read_file(os_path, format)
12
  File "/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.6/site-packages/notebook/services/contents/fileio.py", line 309, in _read_file
13
    bcontent = f.read()
14
MemoryError
15
​
16
Saving disabled.
17
See Console for more details.

我尝试在python中检查我的可用空间,但仍有大约30 GB的可用空间。

有人可以让我知道这种情况是什么问题。谢谢!

1 个答案:

答案 0 :(得分:0)

这个确切的问题发生在我身上。我已经通过增加RAM大小解决了这个问题。

发生此问题是因为to_sql命令正在尝试将整个数据帧转换为SQL代码。一方面,它用完了内存。

解决该问题的方法是按以下方式批量加载数据:

batch_size = 10000
for i in range(0,range(len(final)),batch_size):
    final[i,i+batch_size].to_sql('Reviews',con=conn,schema=None,if_exists='append')