我想从数据库中删除重复的数据, 该语句可以在navicat中执行。
delete from Proxy_Main
where (Proxy_Main.ip,Proxy_Main.port)
in (select ip,port from Proxy_Main group by ip,port
having count(*) > 1)
and rowid not in (select min(rowid) from Proxy_Main
group by ip,port having count(*)>1)
错误信息:
sqlalchemy.exc.OperationalError:“(“)附近的(sqlite3.OperationalError): 语法错误[SQL:'从Proxy_Main删除,其中 (从Proxy_Main中选择ip,port)中的(Proxy_Main.ip,Proxy_Main.port) 按ip分组,端口的count()> 1)和rowid不在(选择 ip_port的Proxy_Main组中的min(rowid),其端口为count()> 1)'] (此错误的背景位于:http://sqlalche.me/e/e3q8)
@staticmethod
def execute(sql):
conn = engine.connect()
conn.execute(sql)
conn.close()
@staticmethod
def deduplication():
SqlHelper.execute('delete from Proxy_Main where (Proxy_Main.ip,Proxy_Main.port) in (select ip,port from Proxy_Main group by ip,port having count(*) > 1) and rowid not in (select min(rowid) from Proxy_Main group by ip,port having count(*)>1)')
答案 0 :(得分:0)
在IN
子句中尝试的行值比较仅在SQLite 3.15+中可用。如链接文档页面最底部所述:
行值已添加到SQLite版本3.15.0(2016-10-14)中。尝试在早期版本的SQLite中使用行值将生成语法错误。
检查您的版本(SELECT sqlite_version();
)。由于查询可以使用更新的版本,因此可以根据需要进行升级(运行示例SQL Fiddle)。
或者,考虑使用聚合联接子查询的更ANSI-SQL解决方案(即,可跨RDBMS移植):
DELETE FROM Proxy_Main
WHERE rowid IN
(SELECT p.rowid
FROM Proxy_Main p
INNER JOIN
(SELECT ip, port, MIN(rowid) As min_id
FROM Proxy_Main
GROUP BY ip, port
HAVING COUNT(*) > 1) AS agg
ON p.ip = agg.ip AND p.port = agg.port
AND p.rowid <> agg.min_id);
Fiddle Demo (按顶部的运行)
请注意,您可以在Python中使用三引号字符串传递多行查询。
@staticmethod
def deduplication():
sql = """DELETE FROM Proxy_Main
WHERE rowid IN
(SELECT p.rowid
FROM Proxy_Main p
INNER JOIN
(SELECT ip, port, MIN(rowid) As min_id
FROM Proxy_Main
GROUP BY ip, port
HAVING COUNT(*) > 1) AS agg
ON p.ip = agg.ip AND p.port = agg.port
AND p.rowid <> agg.min_id);"""
SqlHelper.execute(sql)
答案 1 :(得分:0)
在airflow.cfg 文件(2.0 版)中设置 max_num_rendered_ti_fields_per_task = -1
这对我有用。