大家好,我一直在玩这个查询时间,而且我无法在合理的执行时间内返回结果。
情况如下:
我有三张桌子 -
表1称为:rowsall
1 id int(11)
2 masterCaseId varchar(50)
3 RowNum int(11)
4 fullCaseNumber varchar(50)
5 rowKtavNameFull varchar(250)
6 DateOpen varchar(50)
7 DateProccess varchar(50)
8 rowStatus varchar(50)
9 rowCourt varchar(100)
10 rowProcedure varchar(50)
11 rowCaseType varchar(50)
12 rowIntrest varchar(50)
13 rowDetailsGen varchar(250)
14 rowTypeTeanot varchar(50)
15 rowHisayon varchar(50)
16 rowAmount varchar(50)
17 rowZacautPtor varchar(50)
18 rowZacautApproove varchar(50)
19 rowStatIravon varchar(50)
20 rowDateClose varchar(50)
21 rowCloseReason varchar(50)
22 rowResultTaken varchar(50)
23 rowOldFile varchar(50)
24 rowOpenedInCourse varchar(50)
25 rowGniza varchar(50)
26 rowReasonDeposit varchar(50)
27 rowTypeJudgeType varchar(50)
28 rowJudgeTypeDate
29 rowJudgeTypeName varchar(50)
30 rowGishurType varchar(50)
31 rowGishurDetails varchar(250)
Total rows: 13001, size 11.7mb
Indexes:
PRIMARY BTREE Yes No id 13001 A No
RowNum BTREE No No RowNum 12 A No
rowStatus 12 A No
rowResultTaken 12 A No
rowJudgeTypeName BTREE No No rowJudgeTypeName 1083 A No
masterCaseId BTREE No No masterCaseId 13001 A No
RowNum_2 BTREE No No rowJudgeTypeName 1857 A No
RowNum 1857 A No
fullCaseNumber BTREE No No fullCaseNumber 203 A No
表2称为:casses_rows
1 id int(11)
2 caseFullNum varchar(50)
3 statusCrawl varchar(50)
4 courtPlace text
5 rowsNum int(11)
6 caseJudge varchar(50)
7 caseFullName text
8 whenCrawled datetime
9 yearVal varchar(5)
10 monthVal varchar(5)
11 caseVal int(11)
Total rows: ~23,846, size 4.8mb
Indexes:
PRIMARY BTREE Yes No id 26302 A No
表3称为:casedocs
1 id int(11)
2 caseNum varchar(20)
3 DocTitle varchar(250)
4 DocDateStr varchar(20)
5 KeyWords text
6 content text
7 DocDateParsed timestamp
Total rows: ~1,163,669, size 4.1g
Indexes:
PRIMARY BTREE Yes No id 895132 A No
caseNum BTREE No No caseNum 895132 A No
我的目标:
我需要连接这些表来获取table1中的大多数col +表2中的一个col +表3中的一个col,如果没有匹配则为NULL:
我的查询是:
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`, A.`rowStatus`, A.`rowIntrest` ,A.`DateOpen` ,A.`DateProccess`, A.`rowDateClose`, A.`rowJudgeTypeDate`
FROM (SELECT * FROM `rowsall` WHERE `rowJudgeTypeName` LIKE '%@value1%' AND `RowNum` ='1' ) A
INNER JOIN ( SELECT `id`,`caseFullName` FROM `casses_rows` ) B
ON A.`masterCaseId` = B.`id`
LEFT JOIN (SELECT `caseNum` FROM `casedocs` GROUP BY `caseNum` ORDER BY NULL ) C
ON A.`fullCaseNumber` = C.`caseNum`
结果是我想要的,但问题是 1分钟返回结果......
以下是EXPLAIN:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 121
1 PRIMARY <derived3> ALL NULL NULL NULL NULL 24185 Using where; Using join buffer
1 PRIMARY <derived4> ALL NULL NULL NULL NULL 343438
4 DERIVED casedocs index NULL caseNum 62 NULL 768024 Using index
3 DERIVED casses_rows ALL NULL NULL NULL NULL 29872
2 DERIVED rowsall ref RowNum RowNum 4 6500 Using where
正如您所看到的,我将表3分组以防止连接在结果中创建重复行 - 实际上第三个连接是测试是否存在与该案例对应的文档(将为NULL)。
更多信息:
任何想法为什么执行第三次连接需要这么长时间????
完成任务! 感谢@Turophile和@Joel Coehoorn,新的测试结果是0.004秒!
以下是finall查询:
SELECT DISTINCT A.`id` AS idRowCase, C.`caseNum` AS isPaperAva, A.`rowCaseType` , A.`fullCaseNumber` , A.`rowProcedure` , B.`caseFullName` , A.`rowCourt` , A.`rowAmount` , A.`rowResultTaken` , A.`rowStatus` , A.`rowIntrest` , A.`DateOpen` , A.`DateProccess` , A.`rowDateClose` , A.`rowJudgeTypeDate`
FROM `rowsall` A
INNER JOIN `casses_rows` B ON A.`masterCaseId` = B.`id`
LEFT JOIN `casedocs` C ON A.`fullCaseNumber` = C.`caseNum`
WHERE A.`rowJudgeTypeName` LIKE '%@value1%'
AND A.`RowNum` = '1'
答案 0 :(得分:2)
我的建议是不要不必要地排序和分组。所以,像这样:
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`,
A.`rowStatus`,
A.`rowIntrest`,
A.`DateOpen` ,
A.`DateProccess`,
A.`rowDateClose`,
A.`rowJudgeTypeDate`
FROM `rowsall` AS A
INNER JOIN `casses_rows` AS B
ON A.`masterCaseId` = B.`id`
LEFT JOIN `casedocs` AS C
ON A.`fullCaseNumber` = C.`caseNum`
WHERE `rowJudgeTypeName` LIKE '%@value1%'
AND `RowNum` ='1'
(如果caseNum不是唯一的话,可能会返回不同的结果(多行)。
您还可以将LEFT JOIN
转换为子选择:
SELECT
A.`id` AS idRowCase,
A.`fullCaseNumber` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`,
A.`rowStatus`,
A.`rowIntrest`,
A.`DateOpen` ,
A.`DateProccess`,
A.`rowDateClose`,
A.`rowJudgeTypeDate`
FROM `rowsall` AS A
INNER JOIN `casses_rows` AS B
ON A.`masterCaseId` = B.`id`
WHERE `rowJudgeTypeName` LIKE '%@value1%'
AND `RowNum` ='1'
AND A.`fullCaseNumber` in (SELECT `caseNum` FROM `casedocs` )
但这表明使用表casedocs
有点多余 - 是否真的需要它?
答案 1 :(得分:1)
首先,前两个表根本不需要子查询。这可以通过连接条件和WHERE子句直接更好地表达。
此外,最后一次加入使用子查询和组:
LEFT JOIN(SELECT
caseNum
FROMcasedocs
GROUP BYcaseNum
ORDER BY NULL)
这破坏了MySql在计算最后一次连接时使用任何索引的能力。如果您可以重新编写此表以首先加入表,并在外部查询中执行GROUP BY,以便获得相同的结果,它可能会更好地执行更多,因为您将会最好使用索引。
SELECT
A.`id` AS idRowCase,
C.`caseNum` AS isPaperAva,
A.`rowCaseType`,
A.`fullCaseNumber`,
A.`rowProcedure`,
B.`caseFullName`,
A.`rowCourt`,
A.`rowAmount`,
A.`rowResultTaken`, A.`rowStatus`, A.`rowIntrest` ,A.`DateOpen` ,A.`DateProccess`, A.`rowDateClose`, A.`rowJudgeTypeDate`
FROM `rowsall` A
INNER JOIN `casses_rows` B ON A.`masterCaseId` = B.`id`
LEFT JOIN (SELECT `caseNum` FROM `casedocs` GROUP BY `caseNum` ) C ON c.`caseNum` = A.`fullCaseNumber`
WHERE A.`rowJudgeTypeName` LIKE '%@value1%' AND A.`RowNum` ='1'