在我

时间:2018-08-25 14:01:25

标签: python sql sqlite

我想从数据库中删除重复的数据, 该语句可以在navicat中执行。

delete from Proxy_Main 
where (Proxy_Main.ip,Proxy_Main.port) 
       in (select ip,port from Proxy_Main group by ip,port 
           having count(*) > 1) 
   and rowid not in (select min(rowid) from Proxy_Main 
                     group by ip,port having count(*)>1)


错误信息:

  

sqlalchemy.exc.OperationalError:“(“)附近的(sqlite3.OperationalError):   语法错误[SQL:'从Proxy_Main删除,其中   (从Proxy_Main中选择ip,port)中的(Proxy_Main.ip,Proxy_Main.port)   按ip分组,端口的count()> 1)和rowid不在(选择   ip_port的Proxy_Main组中的min(rowid),其端口为count()> 1)']   (此错误的背景位于:http://sqlalche.me/e/e3q8

   @staticmethod
    def execute(sql):
        conn = engine.connect()
        conn.execute(sql)
        conn.close()

    @staticmethod
    def deduplication():
         SqlHelper.execute('delete from Proxy_Main where (Proxy_Main.ip,Proxy_Main.port) in (select ip,port from Proxy_Main group by ip,port having count(*) > 1) and rowid not in (select min(rowid) from Proxy_Main group by ip,port having count(*)>1)')

2 个答案:

答案 0 :(得分:0)

IN子句中尝试的行值比较仅在SQLite 3.15+中可用。如链接文档页面最底部所述:

  

行值已添加到SQLite版本3.15.0(2016-10-14)中。尝试在早期版本的SQLite中使用行值将生成语法错误。

检查您的版本(SELECT sqlite_version();)。由于查询可以使用更新的版本,因此可以根据需要进行升级(运行示例SQL Fiddle)。

或者,考虑使用聚合联接子查询的更ANSI-SQL解决方案(即,可跨RDBMS移植):

DELETE FROM Proxy_Main 
WHERE rowid IN
   (SELECT p.rowid
    FROM Proxy_Main p
    INNER JOIN 
         (SELECT ip, port, MIN(rowid) As min_id 
          FROM Proxy_Main 
          GROUP BY ip, port
          HAVING COUNT(*) > 1) AS agg
    ON p.ip = agg.ip AND p.port = agg.port
    AND p.rowid <> agg.min_id);

Fiddle Demo (按顶部的运行)


请注意,您可以在Python中使用三引号字符串传递多行查询。

@staticmethod
def deduplication():
    sql = """DELETE FROM Proxy_Main 
             WHERE rowid IN
                (SELECT p.rowid
                 FROM Proxy_Main p
                 INNER JOIN 
                      (SELECT ip, port, MIN(rowid) As min_id 
                       FROM Proxy_Main 
                       GROUP BY ip, port
                       HAVING COUNT(*) > 1) AS agg
                 ON p.ip = agg.ip AND p.port = agg.port
                 AND p.rowid <> agg.min_id);"""

    SqlHelper.execute(sql)

答案 1 :(得分:0)

在airflow.cfg 文件(2.0 版)中设置 max_num_rendered_ti_fields_per_task = -1

这对我有用。