熊猫问题:read_sql在一个查询中包含多个语句,但没有返回行

时间:2019-07-16 19:09:07

标签: python mysql pandas

我正在尝试使用pd.read_sql(sql, uri)将数据库中规定的前10个处方放入数据框中,但返回的错误如下:

~\AppData\Local\Continuum\anaconda3\envs\GISProjects\lib\site-packages\sqlalchemy\engine\result.py in _non_result(self, default)
   1168         if self._metadata is None:
   1169             raise exc.ResourceClosedError(
-> 1170                 "This result object does not return rows. "
   1171                 "It has been closed automatically."
   1172             )

ResourceClosedError: This result object does not return rows. It has been closed automatically.

我的查询具有局部变量以跟踪排名,因此它会根据实践返回前十名处方。如果我在MySql Workbench中运行它,它将起作用,但是当我使用pd.read_sql()

时,它将不起作用
sql = """
SET @current_practice = 0;
SET @practice_rank = 0;
select practice, bnf_code_9, total_items, practice_rank
FROM (select a.practice,
             a.bnf_code_9,
             a.total_items,
             @practice_rank := IF(@current_practice = a.practice, @practice_rank + 1, 1) AS practice_rank,
             @current_practice := a.practice
      FROM (select rp.practice, rp.bnf_code_9, sum(rp.items) as total_items
            from rx_prescribed rp
            where ignore_flag = '0'
            group by practice, bnf_code_9) a
      order by a.practice, a.total_items desc) ranked
where practice_rank <= 10;
"""
df = pd.read_sql(sql, uri)

我希望它返回数据并将其返回到pandas数据框,但返回错误。我认为这是从设置局部变量的第一条语句中得出的。前两个语句是必需的,以便数据返回前10位。

在没有前两个语句的情况下,它可以正常工作,但是,它在practice_rank列的所有行中都返回“ 1”,而不是期望值1、2、3等。

有没有一种方法可以运行多个语句并返回上一条执行语句的结果?

1 个答案:

答案 0 :(得分:0)

简短答案

pandas.read_sql()语句中调用的程序堆栈为:pandas> SQLAlchemy> MySQLdb或pymysql> MySql数据库。数据库驱动程序mysqlclient(mysqldb)和pymysql不喜欢在单个execute()调用中使用多个SQL语句。将它们分成单独的呼叫。

解决方案

import pandas as pd
from sqlalchemy import create_engine

# mysqldb is the default, use mysql+pymysql to use the pymysql driver
# URI format: mysql<+driver>://<user:password@>localhost/database
engine = create_engine('mysql://localhost/test')

# First two lines starting with SET removed
sql = '''
SELECT practice, bnf_code_9, total_items, practice_rank
FROM (
    SELECT
        a.practice,
        a.bnf_code_9,
        a.total_items,
        @practice_rank := IF(@current_practice = a.practice, @practice_rank + 1, 1) AS practice_rank,
        @current_practice := a.practice
    FROM (
        SELECT
            rp.practice, rp.bnf_code_9, sum(rp.items) AS total_items
        FROM rx_prescribed rp
        WHERE ignore_flag = '0'
        GROUP BY practice, bnf_code_9
    ) a
    ORDER BY a.practice, a.total_items DESC
) ranked
WHERE practice_rank <= 10;
'''

with engine.connect() as con:
    con.execute('SET @current_practice = 0;')
    con.execute('SET @practice_rank = 0;')

    df = pd.read_sql(sql, con)

print(df)

结果:

   practice  bnf_code_9  total_items  practice_rank
0         2           3          6.0              1
1         6           1          9.0              1
2         6           2          4.0              2
3         6           4          3.0              3
4        17           1          0.0              1
5        42          42         42.0              1

我使用以下代码为您的问题创建了一个测试数据库。

DROP TABLE IF EXISTS rx_prescribed;
CREATE TABLE rx_prescribed (
    id INT AUTO_INCREMENT PRIMARY KEY,
    practice INT,
    bnf_code_9 INT,
    items INT,
    ignore_flag INT
);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (2, 3, 4, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (2, 3, 2, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (6, 1, 9, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (6, 2, 4, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (6, 4, 3, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (9, 11, 1, 1);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (17, 1, 0, 0);
INSERT INTO rx_prescribed (practice, bnf_code_9, items, ignore_flag) VALUES (42, 42, 42, 0);

在MariaDB 10.3上测试。