我有以下表格:
"crawlresults"
id | url | fk_crawljobs_id
---------------------------------------------
1 | shop*com/notebooks | 1
2 | shop*com/fridges | 1
3 | website*com/lists | 2
"extractions"
id | fk_extractors_id | data | fk_crawlresults_id
---------------------------------------------------------------
1 | 1 | 123.45 | 1
2 | 2 | notebook | 1
3 | 3 | ibm.jpg | 1
4 | 1 | 44.5 | 2
5 | 2 | fridge | 2
6 | 3 | picture.jpg | 3
7 | 4 | hello | 3
8 | 4 | world | 3
9 | 5 | hi | 3
10 | 5 | my | 3
11 | 5 | friend | 3
"extractors"
id | extractorname
----------------------
1 | price
2 | article
3 | imageurl
4 | list_1
5 | list_2
我需要构造一个select语句来获取提取器表中提取器表中每个提取器的列。
示例:
url | price | article | imageurl
--------------------------------------------------------
shop*com/notebooks | 123.45 | notebook | ibm.jpg
shop*com/fridges | 44.5 | fridge | NULL
执行select语句时,我没有多少提取符存在,因此必须动态构建。
修改 我忘了提到我的提取中可能有多个“列表”。在这种情况下,我需要以下结果集。
示例2:
url | list_1 | imageurl | list_2
--------------------------------------------------------
website*com/lists | hello | picture.jpg | NULL
website*com/lists | world | picture.jpg | NULL
website*com/lists | NULL | picture.jpg | hello
website*com/lists | NULL | picture.jpg | my
website*com/lists | NULL | picture.jpg | friend
谢谢!
答案 0 :(得分:3)
您正在寻找Dynamic pivot tables。
代码:
SET @sql = NULL;
SELECT
GROUP_CONCAT(DISTINCT
CONCAT(
'MAX(IF(pa.extractorname = ''',
extractorname,
''', p.data, NULL)) AS ',
extractorname
)
) INTO @sql
FROM extractors;
SET @sql = CONCAT('SELECT c.url, ',
@sql,
' FROM crawlresults c',
' INNER JOIN extractions p on (c.id = p.fk_crawlresults_id)',
' INNER JOIN extractors pa on (p.fk_extractors_id = pa.id)'
' WHERE c.fk_crawljobs_id = 1',
' GROUP BY c.id');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
基本上,您的原始查询生成了一个虚假@sql
变量,该变量并未真正为每个data
提取extractorname
。您也不需要所有这些联接来创建@sql
。您只需要每个属性名称(来自extractor
表)和对包含期望值(data
)的列的引用。
如果对结构有疑问,请为一组固定的属性写出一个简单的数据透视查询。通过这种方式,可以轻松识别用于编写动态查询的模式。
SELECT c.url,
MAX(IF(pa.extractorname = 'price', p.data, NULL)) AS price,
MAX(IF(pa.extractorname = 'article', p.data, NULL)) AS article,
MAX(IF(pa.extractorname = 'imageurl', p.data, NULL)) AS imageurl
FROM crawlresults c
LEFT JOIN extractions p on (c.id = p.fk_crawlresults_id)
LEFT JOIN extractors pa on (p.fk_extractors_id = pa.id)
WHERE c.fk_crawljobs_id = 1
GROUP BY c.id
至于你的其余查询,这很好,请记住,如果某些LEFT JOINS
没有extractions
,crawlresults
可能会有用。此外,如果您的表格可以包含多个crawlresult
url
/ fk_crawljobs_id
,则按url
进行分组不是一个好主意(MAX
可能会混淆多个extractions
)。