使用的DB是Firebird 2.1,如果不熟悉这里是select语句sql ref:
http://ibexpert.net/ibe/index.php?n=Doc.DataRetrieval
函数ref:
http://www.firebirdsql.org/file/documentation/reference_manuals/reference_material/html/langrefupd21.html
我会对任何sql俚语感到满意[我会转换它]。
表架构:
CREATE TABLE EVENT_MASTER (
EVENT_ID BIGINT NOT NULL,
EVENT_TIME BIGINT NOT NULL,
DATA_F1 VARCHAR(40),
DATA_F2 VARCHAR(40),
PRIMARY KEY (EVENT_ID)
);
坏消息是EVENT_TIME存储为自纪元以来经过的秒数。
数据样本:
"EVENT_ID","EVENT_TIME","DATA_F1","DATA_F2"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25329,1297824773,"8604","A"
25330,1297824793,"8604","A"
25331,1297824809,"8604","1"
25332,1297824811,"8604","GREY"
25333,1297824812,"8604","A"
25334,1297824825,"8604","GREY"
25335,1297824831,"8604","A"
25336,1297824833,"8604","GREY"
25337,1297824838,"8604","A"
25338,1297824840,"8604","1"
25339,1297824850,"8604","A"
25340,1297824864,"8604","A"
25341,1297824875,"8804","GREY" //notice DATA_F1 is different
25342,1297824876,"8604","G"
25343,1297824877,"8604","A"
25344,1297824880,"8604","GREY"
25345,1297824895,"8604","1"
25346,1297824899,"8604","A"
25347,1297824918,"8604","GREY"
25348,1297824930,"8604","YELLOW"
25349,1297824939,"8604","GREY"
25350,1297824940,"8604",""
25351,1297824944,"8604","A"
25352,1297824945,"8604","1"
25353,1297824954,"8604","B"
25354,1297824958,"8604",""
25355,1297824964,"8604","1"
25356,1297824966,"8604","GREY"
25357,1297824974,"8604","1"
25358,1297824981,"8604","GREY"
25359,1297824983,"8604",""
25360,1297824998,"8604","GREY"
25361,1297825003,"8604","2"
25362,1297825009,"8604","G"
25363,1297825018,"8604","GREY"
25364,1297825026,"8604","F"
25365,1297825045,"8604","GREY"
25366,1297825046,"8604","1"
预期产量:
根据EVENT_TIME,在X分钟内不同的“DATA_F1”,“DATA_F2”行:
像:
25341,1297824875,"8804","GREY"
25327,1297824698,"8604","A"
25328,1297824770,"8604","I"
25332,1297824811,"8604","GREY"
25348,1297824930,"8604","YELLOW"
..etc
要求: 消除5分钟内发生的冗余记录的选择[根据EVENT_TIME列的范围计算]。
最后我正在尝试遵循这种模式:
SELECT * FROM EVENT_MASTER inner join (
SELECT distinct DATA_F1, DATA_F2 FROM EVENT_MASTER where /*the hard stuff that i need help with: (EVENT_TIME difference within X minutes)*/
) as RemovedDup ON /*EVENT_MASTER.EVENT_ID = problem is i cant select RemovedDup ID otherwise distinct becomes useless!!*/
请尽快帮助。
感谢,
修改
根据Andrei K添加输出。回答:
25331,1297824809,"8604","1"
25327,1297824698,"8604","A"
25342,1297824876,"8604","G"
25332,1297824811,"8604","GREY"
25328,1297824770,"8604","I"
25341,1297824875,"8804","GREY"
25350,1297824940,"8604",""
25352,1297824945,"8604","1" /*bug: time still within 300 seconds, this same as first record*/
25361,1297825003,"8604","2"
25351,1297824944,"8604","A"
25353,1297824954,"8604","B"
25364,1297825026,"8604","F"
25362,1297825009,"8604","G"
25347,1297824918,"8604","GREY"
25372,1297825087,"8604","ORANGE"
25348,1297824930,"8604","YELLOW"
25382,1297825216,"8604","1"
25387,1297825270,"8604","B"
25394,1297825355,"8604","BLUE"
25381,1297825211,"8604","GREY"
编辑2: Russell查询输出:输出不好而且非常慢。
1297824698,"8604","A"
1297824770,"8604","I"
1297824809,"8604","1"
1297824811,"8604","GREY"
1297824825,"8604","GREY"
1297824840,"8604","1"
1297824875,"8804","GREY"
1297824876,"8604","G"
1297824880,"8604","GREY"
1297824918,"8604","GREY"
1297824930,"8604","YELLOW"
1297824939,"8604","GREY"
1297824940,"8604",""
1297824945,"8604","1"
1297824954,"8604","B"
1297824964,"8604","1"
1297824998,"8604","GREY"
1297825003,"8604","2"
1297825018,"8604","GREY"
1297825026,"8604","F"
1297825045,"8604","GREY"
1297825046,"8604","1"
1297825063,"8604","1"
1297825079,"8604","GREY"
1297825087,"8604","ORANGE"
1297825094,"8604","GREY"
1297825100,"8604","1"
1297825133,"8604","GREY"
1297825176,"8604","GREY"
1297825216,"8604","1"
编辑3:
基于Russell请求的是:所有行WHERE DATA_F1 ='8604'AND DATA_F2 ='GRAY'
25332,1297824811,"8604","GREY"
25334,1297824825,"8604","GREY"
25336,1297824833,"8604","GREY"
25344,1297824880,"8604","GREY"
25347,1297824918,"8604","GREY"
25349,1297824939,"8604","GREY"
25356,1297824966,"8604","GREY"
25358,1297824981,"8604","GREY"
25360,1297824998,"8604","GREY"
25363,1297825018,"8604","GREY"
25365,1297825045,"8604","GREY"
25367,1297825059,"8604","GREY"
25371,1297825079,"8604","GREY"
25373,1297825094,"8604","GREY"
25376,1297825116,"8604","GREY"
25378,1297825133,"8604","GREY"
25380,1297825176,"8604","GREY"
25381,1297825211,"8604","GREY"
25384,1297825234,"8604","GREY"
25389,1297825286,"8604","GREY"
25390,1297825314,"8604","GREY"
25391,1297825323,"8604","GREY"
25393,1297825343,"8604","GREY"
25396,1297825370,"8604","GREY"
25397,1297825387,"8604","GREY"
25399,1297825416,"8604","GREY"
25401,1297825436,"8604","GREY"
25402,1297825445,"8604","GREY"
25404,1297825454,"8604","GREY"
50282,1299137344,"8604","GREY"
380151,1309849420,"8604","GREY"
截止到目前为止[格林威治标准时间2011年10月11日上午5点]没有发布绝对正确的答案,而安德烈·K仍然是最好的尝试。所以sql专家请帮我找到解决方案,否则我会开始认为sql无法处理问题的要求!是吗??
备注:event_time不是唯一的,因此可以在同一秒内发生多个事件。
答案 0 :(得分:4)
如果冗余行表示在5分钟内注册的行并且具有相同的data_f1,则data_f2会尝试这样的事情:
SELECT
e2.event_id,
e2.event_time,
e2.data_f1,
e2.data_f2
FROM
(SELECT trunc(event_time / 300), data_f1, data_f2, min(event_id) as e_id
FROM event_master
GROUP BY 1, 2, 3) e1
JOIN
event_master e2 ON e1.e_id = e2.event_id
答案 1 :(得分:2)
你可以尝试这个:::
SELECT * FROM EVENT_MASTER group by (DATAF1, DATAF2) where
event_time >(SELECT TIME_TO_SEC(now())-300)
希望这会对你有帮助..
答案 2 :(得分:2)
我不熟悉Firebird,但我正在使用文档,所以如果这是正确的,那么这应该工作。
SELECT DISTINCT MIN(A.EVENT_TIME) as MINEVENT_TIME, B.DATA_F1, B.DATA_F2
FROM EVENT_MASTER as A
JOIN EVENT_MASTER as B ON A.EVENT_TIME BETWEEN B.EVENT_TIME-299 AND B.EVENT_TIME
AND B.DATA_F1 = A.DATA_F1 AND B.DATA_F2 = A.DATA_F2
GROUP BY B.DATA_F1, B.DATA_F2, B.EVENT_TIME
这是语法检查但未经测试。
答案 3 :(得分:2)
这假设所有记录在event_time中具有不同的值(或者它们将彼此排除)。
SELECT
*
FROM
event_master AS data
WHERE
NOT EXISTS (
SELECT * FROM event_master
WHERE event_time > data.event_time - 300
AND event_time <= data.event_time
)
如果在event_time
中使用相同的值发生多重事件,我们是否可以假设event_id
更高的事件不会发生在event_id
更低的事件之前?如果是这样,您可以按如下方式修改上述内容
SELECT
*
FROM
event_master AS data
WHERE
NOT EXISTS (
SELECT * FROM event_master
WHERE event_time > data.event_time - 300
AND event_time <= data.event_time
AND event_id < data.event_id
)
如果同时发生多个事件,将选择event_id最低的事件。
在性能方面,请确保数据具有索引,其中event_time
是第一个索引字段。
答案 4 :(得分:2)
据我了解,您希望获得DATA_F1和DATA_F2的不同值,但仅适用于5分钟的“窗口”;之后,价值可能再次出现,对吧? (对不起,如果我误解了这个问题,那是漫长的一天......)我对Firebird了解不多,但是你会在MS SQL服务器中这样做:
SELECT a.EVENT_ID, a.DATA_F1, a.DATA_F2, a.EVENT_TIME FROM
EVENT_MASTER AS a LEFT JOIN EVENT_MASTER AS b
ON a.DATA_F1=b.DATA_F1 AND
a.DATA_F2=b.DATA_F2 AND
a.EVENT_TIME<b.EVENT_TIME AND
b.EVENT_TIME-a.EVENT_TIME<=5*60
WHERE
b.EVENT_ID IS NULL
另外,在测试时,也请尝试下面的修改版本:希望这会有所帮助!
SELECT a.EVENT_ID, a.DATA_F1, a.DATA_F2, a.EVENT_TIME FROM
EVENT_MASTER AS a LEFT JOIN EVENT_MASTER AS b
ON a.DATA_F1=b.DATA_F1 AND
a.DATA_F2=b.DATA_F2 AND
a.EVENT_ID<b.EVENT_ID AND
a.EVENT_TIME<=b.EVENT_TIME AND
b.EVENT_TIME-a.EVENT_TIME<=5*60
WHERE
b.EVENT_ID IS NULL
已添加:好的,好像我们有正确的结果。这是我建议优化这个宝贝(因为我看到Firebird支持EXISTS关键字,我已经重写了下面的查询):
SELECT a.EVENT_ID, a.DATA_F1, a.DATA_F2, a.EVENT_TIME FROM EVENT_MASTER AS a
WHERE NOT EXISTS (SELECT * FROM EVENT_MASTER AS b
WHERE a.DATA_F1=b.DATA_F1 AND
a.DATA_F2=b.DATA_F2 AND
a.EVENT_ID<b.EVENT_ID AND
a.EVENT_TIME<=b.EVENT_TIME AND
b.EVENT_TIME-a.EVENT_TIME<=5*60)
另外,请添加以下索引:
CREATE INDEX IX_SPEED ON EVENT_MASTER (EVENT_ID DESC, EVENT_TIME ASC, DATA_F1 ASC, DATA_F2 ASC)
希望这有帮助!
答案 5 :(得分:1)
尝试:
SELECT T1.* FROM EVENT_MASTER T1 WHERE EXISTS (
SELECT * FROM EVENT_MASTER T2
WHERE T2.DATA_F1=T1.DATA_F1
AND T2.DATA_F2=T1.DATA_F2
AND (T2.EVENT_TIME-T1.EVENT_TIME)<300
)
答案 6 :(得分:1)
你需要一个非常讨厌的递归查询来完成这个纯粹的“功能”方式。我并不自信能够巧妙地构建这样的查询,更不用说使其具有高效性。
另一方面,允许副作用(即临时表)显着简化了事情。您甚至可以通过在临时表上添加适当的索引(此处未显示)来使其快速完成。这是实际的SQL:
CREATE GLOBAL TEMPORARY TABLE EVENT_MASTER_TMP (
EVENT_ID BIGINT NOT NULL,
EVENT_TIME BIGINT NOT NULL,
DATA_F1 VARCHAR(40),
DATA_F2 VARCHAR(40),
PRIMARY KEY (EVENT_ID)
);
INSERT INTO EVENT_MASTER_TMP
SELECT * FROM
(SELECT * FROM EVENT_MASTER ORDER BY EVENT_TIME) E
WHERE
NOT EXISTS (
SELECT *
FROM EVENT_MASTER_TMP T
WHERE
E.DATA_F1 = T.DATA_F1
AND E.DATA_F2 = T.DATA_F2
AND E.EVENT_TIME - T.EVENT_TIME <= 5*60
);
SELECT * FROM EVENT_MASTER_TMP;
用简单的英语:
对您的测试数据执行此操作会产生:
25327 1297824698 8604 A
25328 1297824770 8604 I
25331 1297824809 8604 1
25332 1297824811 8604 GREY
25341 1297824875 8804 GREY
25342 1297824876 8604 G
25348 1297824930 8604 YELLOW
25350 1297824940 8604
25353 1297824954 8604 B
25361 1297825003 8604 2
25364 1297825026 8604 F
将时间阈值从5*60
降低到比如233
,产生这个:
25327 1297824698 8604 A
25328 1297824770 8604 I
25331 1297824809 8604 1
25332 1297824811 8604 GREY
25341 1297824875 8804 GREY
25342 1297824876 8604 G
25348 1297824930 8604 YELLOW
25350 1297824940 8604
25351 1297824944 8604 A <-- 246s difference
25353 1297824954 8604 B
25361 1297825003 8604 2
25364 1297825026 8604 F
25365 1297825045 8604 GREY <-- 234s difference
25366 1297825046 8604 1 <-- 237s difference