我的表是这样的:
CREATE TABLE USER_TRANSACTIONS (
START_TIME BIGINT UNSIGNED NOT NULL,
APPLICATION_ID CHAR(64) BINARY NOT NULL,
ENTRY_POINT CHAR(255) BINARY NOT NULL,
USER_ID CHAR(64) BINARY NOT NULL,
ERROR_VIOLATION BIT(1) NOT NULL,
LATENCY_VIOLATION BIT(1) NOT NULL,
PRIMARY KEY (START_TIME, APPLICATION_ID, ENTRY_POINT, USER_ID)
)
我想要实现的总结如下:每个entry_point我想看看有多少独特用户以及有多少用户有错误和延迟问题。
例如:
ENTRY_POINT | TOTAL_USERS | TOTAL_ERRORS | TOTAL_LATENCY
page1 | 2 | 2 | 1
page2 | 1 | 1 | 1
我可以通过此查询实现此目标:
SELECT UT.ENTRY_POINT, COUNT(USER_ID) AS TOTAL_USERS, SUM(EXP_ERRORS) AS TOTAL_ERRORS, SUM(EXP_LATENCY) AS TOTAL_LATENCY
FROM (
SELECT ENTRY_POINT, USER_ID,
BIT_OR(ERROR_VIOLATION) AS EXP_ERRORS,
BIT_OR(LATENCY_VIOLATION) AS EXP_LATENCY
FROM user_transactions
GROUP BY ENTRY_POINT, USER_ID
) AS UT
GROUP BY UT.ENTRY_POINT;
嵌套查询用于总结用户是否遇到过错误或延迟问题,但在包含大量数据的表上我遇到了性能问题。
我的问题是如何优化此查询以避免使用内部子查询?
答案 0 :(得分:2)
使用count(distinct)
。以下是编写查询的一种方法:
SELECT ENTRY_POINT, COUNT(DISTINCT USER_ID),
SUM(ERROR_VIOLATION > 0) AS TOTAL_ERRORS,
SUM(LATENCY_VIOLATION > 0) AS TOTAL_LATENCY
FROM user_transactions
GROUP BY ENTRY_POINT;
如果您希望用户有错误而不是总错误:
SELECT ENTRY_POINT, COUNT(DISTINCT USER_ID),
COUNT(DISTINCT CASE WHEN ERROR_VIOLATION > 0 THEN USER_ID END) AS TOTAL_ERRORS,
COUNT(DISTINCT CASE WHEN LATENCY_VIOLATION > 0 THEN USER_ID END) AS TOTAL_LATENCY
FROM user_transactions
GROUP BY ENTRY_POINT;
答案 1 :(得分:1)
你不能只使用这样的东西:
SELECT
ENTRY_POINT
,COUNT(USER_ID) AS TOTAL_USERS
,SUM(EXP_ERRORS) AS TOTAL_ERRORS
,SUM(EXP_LATENCY) AS TOTAL_LATENCY
FROM user_transactions
GROUP BY ENTRY_POINT