家伙!希望有人可以帮我解决这个问题。
我通过SQLAlchemy执行一个查询,它返回我需要在python脚本上处理的~6kk行(它的历史数据)。我有一些函数可以使用pandas dataframe读取和处理数据。这里有函数:
def consulta_db_cancelamentos(db_con, query):
engine = create_engine(db_con, pool_recycle=3600)
con = engine.connect()
query_result = con.execution_options(stream_results=True).execute(query)
query_result_list = []
while True:
rows = query_result.fetchmany(10000)
if not rows:
break
for row in rows:
data = row['data'],\
row['plano'],\
row['usuario_id'],\
row['timestamp_cancelamentos'],\
row['timestamp'],\
row['status']
query_result_list.append(data)
df = pd.DataFrame()
if df.empty:
df = pd.DataFrame(query_result_list)
else:
df.append(pd.DataFrame(query_result_list))
df_cor = corrige_cancelamentos_df(df, '2017-01-01', '2017-12-15')
con.close()
return df_cor
正如您所看到的,我已经尝试读取数据并将其存储/存储在10k行块中。当我尝试执行整个脚本时,我在函数上遇到了这个错误(我还包括main()
上引发的错误):
Traceback (most recent call last):
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1159, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1318, in _fetchmany_impl
row = self._fetchone_impl()
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1308, in _fetchone_impl
self.__buffer_rows()
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1295, in __buffer_rows
self.__rowbuffer = collections.deque(self.cursor.fetchmany(size))
File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 485, in fetchmany
row = self.read_next()
File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 446, in read_next
return self._conv_row(self._result._read_rowdata_packet_unbuffered())
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1430, in _read_rowdata_packet_unbuffered
packet = self.connection._read_packet()
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1008, in _read_packet
recv_data = self._read_bytes(bytes_to_read)
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1037, in _read_bytes
CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
pymysql.err.OperationalError: (2013, 'Lost connection to MySQL server during query')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/aiquis/EI/cancelamentos_testes5.py", line 180, in <module>
main()
File "/home/aiquis/EI/cancelamentos_testes5.py", line 164, in main
cancelamentos_df_corrigido = consulta_db_cancelamentos(db_param, query_cancelamentos)
File "/home/aiquis/EI/cancelamentos_testes5.py", line 14, in consulta_db_cancelamentos
rows = query_result.fetchmany(1000)
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1166, in fetchmany
self.cursor, self.context)
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/base.py", line 1413, in _handle_dbapi_exception
exc_info
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/util/compat.py", line 186, in reraise
raise value.with_traceback(tb)
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1159, in fetchmany
l = self.process_rows(self._fetchmany_impl(size))
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1318, in _fetchmany_impl
row = self._fetchone_impl()
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1308, in _fetchone_impl
self.__buffer_rows()
File "/home/aiquis/.local/lib/python3.5/site-packages/sqlalchemy/engine/result.py", line 1295, in __buffer_rows
self.__rowbuffer = collections.deque(self.cursor.fetchmany(size))
File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 485, in fetchmany
row = self.read_next()
File "/usr/local/lib/python3.5/dist-packages/pymysql/cursors.py", line 446, in read_next
return self._conv_row(self._result._read_rowdata_packet_unbuffered())
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1430, in _read_rowdata_packet_unbuffered
packet = self.connection._read_packet()
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1008, in _read_packet
recv_data = self._read_bytes(bytes_to_read)
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1037, in _read_bytes
CR.CR_SERVER_LOST, "Lost connection to MySQL server during query")
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query') (Background on this error at: http://sqlalche.me/e/e3q8)
Exception ignored in: <bound method MySQLResult.__del__ of <pymysql.connections.MySQLResult object at 0x7f8c543dc198>>
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1345, in __del__
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1447, in _finish_unbuffered_query
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 991, in _read_packet
File "/usr/local/lib/python3.5/dist-packages/pymysql/connections.py", line 1022, in _read_bytes
AttributeError: 'NoneType' object has no attribute 'settimeout'
[Finished in 602.4s]
我编写consulta_db_cancelamentos
的方式已经是对SO和SQLAlchemy文档进行一些搜索的结果。假设我无法访问MySQL服务器管理。
当我限制我的查询仅为一个usuario_id
带来结果(例如约50行)时,它可以正常工作。我在MySQL Workbench上执行了相同的查询,持续时间/提取时间为251.998秒/ 357.541秒
答案 0 :(得分:1)
我遇到了这样的问题。将stream_results=True
放入查询后,我遇到了问题。
对我来说,参数net_write_timeout
引起了麻烦。当stream_results=True
且应用程序未立即读取从MySQL服务器发送的所有数据时,则在MySQL服务器端,与应用程序的写通信(发送数据包)被阻止。 net_write_timeout
似乎是一个参数,用于控制允许连接多少秒,以阻止MySQL服务器写入套接字操作。
因此,我将net_write_timeout
参数从60
(默认)更改为3600
,并解决了问题。
答案 1 :(得分:0)
解决了在MySQL服务器中执行此命令的问题:
set global max_allowed_packet = 67108864;