我有7个相关表,在其中一个表上,有一个时间戳列,我想删除超过30天的所有行。但是,这些都是非常大的删除。我说的是数以千万计的记录。如果我从主表中删除所有这些记录,我必须查看其他6个表并从这些表中删除相关记录。
我的问题是优化这种方法的最佳方法是什么?
我正在考虑使用PARTITION
,但只有一个表具有timestamp列。我担心如果我删除主表中的旧分区,相关记录仍将存在于其他6个表中。相关记录通过字段(sid,cid)相关联。
对于上下文我使用的是snort和barnyard,它们是IDS处理器。
我正在使用MySQL 5.1.73,MyISAM表
以下是清理日志的摘录:
StartTime,EndTime,TimeElapsed,AffectedRows
Wed Jan 6 01:00:01 EST 2016,Wed Jan 6 01:45:11 EST 2016,45:10,2911807
Thu Jan 7 01:00:02 EST 2016,Thu Jan 7 01:25:29 EST 2016,25:27,2230255
Fri Jan 8 01:00:01 EST 2016,Fri Jan 8 01:24:18 EST 2016,24:17,1400470
Sat Jan 9 01:00:02 EST 2016,Sat Jan 9 05:47:10 EST 2016,287:8,23360088
Sun Jan 10 01:00:01 EST 2016,Sun Jan 10 10:06:16 EST 2016,546:15,44970072
Mon Jan 11 01:00:01 EST 2016,Mon Jan 11 09:40:39 EST 2016,520:38,43948091
这是我原来的清理脚本:
/usr/bin/mysql --defaults-extra-file=/old/.my.cnf snort_db >> /root/snortcleaner.log 2>&1 <<EOF
use snort_db;
DROP TRIGGER IF EXISTS delete_old;
DELIMITER //
CREATE TRIGGER delete_old AFTER DELETE ON event
FOR EACH ROW
BEGIN
DELETE FROM data WHERE data.cid = old.cid AND data.sid = old.sid;
DELETE FROM iphdr WHERE iphdr.cid = old.cid AND iphdr.sid = old.sid;
DELETE FROM icmphdr WHERE icmphdr.cid = old.cid AND icmphdr.sid = old.sid;
DELETE FROM tcphdr WHERE tcphdr.cid = old.cid AND tcphdr.sid = old.sid;
DELETE FROM udphdr WHERE udphdr.cid = old.cid AND udphdr.sid = old.sid;
DELETE FROM opt WHERE opt.cid = old.cid AND opt.sid = old.sid;
END //
DELIMITER ;
EOF
# Send the main MySQL command: Deletes all records betweeen the oldest timestamp and 31 days from now()
# Gets the oldest timestamp and ranges a deletion from that to 31 days before now(). If the oldest timestamp is more recent than 31 days, the following command returns 0 anyway. If it is older than 31 days, it will return them
OLDEST_TIMESTAMP=$(mysql --defaults-extra-file=/old/.my.cnf -Dsnort_db -se "SELECT timestamp FROM event ORDER BY timestamp ASC LIMIT 1;")
NUM_AFFECTED=$(mysql --defaults-extra-file=/old/.my.cnf -Dsnort_db -se "DELETE FROM event WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 31 DAY); SELECT ROW_COUNT();")
这是我目前的清理脚本:
DELETE FROM event WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR) AND DATE_SUB(NOW(), INTERVAL 31 DAY);
DELETE FROM data USING data LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM iphdr USING iphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM icmphdr USING icmphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM tcphdr USING tcphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM udphdr USING udphdr LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
DELETE FROM opt USING opt LEFT OUTER JOIN event USING (sid,cid) WHERE event.sid IS NULL;
我在两者之间来回切换,因为我不知道哪个更快,但实际情况是两者都太慢了。
答案 0 :(得分:0)
尝试将外键设置为在删除时级联,因此您无需创建触发器并手动加入和删除相关记录。
下面的示例显示了如何创建级联删除
的关系CREATE TABLE parent ( id INT NOT NULL, PRIMARY KEY (id) ) ENGINE=INNODB; CREATE TABLE child ( id INT, parent_id INT, INDEX par_ind (parent_id), FOREIGN KEY (parent_id) REFERENCES parent(id) ON DELETE CASCADE ) ENGINE=INNODB;
的示例
答案 1 :(得分:0)
我们通过创建和删除分区解决了这样的问题。 因此,您首先在表中按日期创建分区(最佳实践 - 使用MySql事件自动化),当您需要删除旧数据时 - 只需删除一些分区 - 操作将立即进行,无任何延迟或锁定。
答案 2 :(得分:0)
如何在删除之前将要删除的行的ID保存到临时表中。
然后,您可以将清理脚本从加入大型表(其中id = null)转换为加入小型(呃)表,其中id&lt;&gt;空。
答案 3 :(得分:0)
我会做两件事:
使用
在其他表中定义外键ON DELETE CASCADE
而不是每小时抄袭行,为简单删除添加LIMIT
DELETE FROM event
WHERE timestamp < DATE_SUB(NOW(), INTERVAL 31 DAY)
LIMIT 500000
并继续重新运行,直到没有受影响的行或经验告诉你需要多次。
调整500000
,使其尽可能大,而不会导致查询死亡。
答案 4 :(得分:0)
将您的脚本更改为:
cid
都有索引cid
值类似的东西:
CREATE TABLE IF NOT EXISTS deleted_cids(int cid); -- ensure same datatype as cid in tables
TRUNCATE deleted_cids;
INSERT INTO deleted_cids
SELECT cid FROM event
WHERE timestamp BETWEEN DATE_SUB('${OLDEST_TIMESTAMP}', INTERVAL 1 HOUR)
AND DATE_SUB(NOW(), INTERVAL 31 DAY)
LIMIT 100000; -- Choose largest LIMIT that gives acceptable execution time
DELETE event FROM deleted_cids, event WHERE event.cid = deleted_cids.cid;
DELETE data FROM deleted_cids, data WHERE data.cid = deleted_cids.cid;
DELETE iphdr FROM deleted_cids, iphdr WHERE iphdr.cid = deleted_cids.cid;
DELETE icmphdr FROM deleted_cids, icmphdr WHERE icmphdr.cid = deleted_cids.cid;
DELETE tcphdr FROM deleted_cids, tcphdr WHERE tcphdr.cid = deleted_cids.cid;
DELETE udphdr FROM deleted_cids, udphdr WHERE udphdr.cid = deleted_cids.cid;
DELETE opt FROM deleted_cids, opt WHERE opt.cid = deleted_cids.cid;
这里的优点是每次删除都是基于索引的单一执行,用于删除所有目标行 - 它应该快速执行。
通过调整LIMIT和执行频率,您可以找到服务器负载的正确平衡。我会选择经常执行少量的操作,因此您的服务器永远不会因此过程停止。