编辑更新:原来我的版本是5.7,因此“窗口函数”不是为了找到解决方案的选择。
SHOW VARIABLES LIKE 'version';
+---------------+------------+
| Variable_name | Value |
+---------------+------------+
| version | 5.7.21-log |
+---------------+------------+
问题描述:我在要约,技能和个人资料之间有一个三元关系表。该三元关系具有排名的属性。
我有一个技能表,可以在其中查看技能名称。直到现在,我必须做两个查询:
1)给我每个个人资料排名前10位的技能:
SELECT DISTINCT ternary.id_skill, skill.name_skill, ranking_skill
FROM ternary
INNER JOIN skill ON skill.id_skill=ternary.id_skill
WHERE ternary.id_perfil= #IntNumber#
GROUP BY ternary.id_skill
ORDER BY ternary.ranking_skill DESC
LIMIT 10;
2)有关ID技能的列表,请告诉我它们是否出现在任何个人资料中,以及出现多少次。
SELECT DISTINCT ternary.id_profile, nombre_profile, COUNT(DISTINCT ternary.id_skill) AS matching
FROM ternary
INNER JOIN profile ON ternary.id_profile=profile.id_profile
WHERE ternary.id_skill= '858534430'
OR ternary.id_skill= '3213227'
OR ternary.id_skill= '3254818'
GROUP BY(ternary.id_profile)
ORDER BY matching DESC;
在最后一个查询中,发现了一个问题:它“搜索”在任何时候出现的个人资料。由于个人资料可能具有成千上万的技能,所以由于我们想要实现的目标,这可能会产生误导。现在,当个人资料是ANY个人资料的十大技能之一时,我只需要“搜索”即可。但仅排在前10名。
到目前为止,基本上,我一直在尝试将两个查询混合在一起,但收效甚微,因为似乎无法对两列进行分区,即使只使用一列,我也会得到You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(PARTITION BY
:>
SELECT *
FROM
(
SELECT DISTINCT ternary.id_skill,
skill.name_skill,
ternary.ranking_skill,
ternary.id_profile,
ROW_NUMBER() OVER(PARTITION BY id_profile, id_skill ORDER BY ternary.ranking_skill DESC) rn
FROM ternary
INNER JOIN skill ON skill.id_skill=ternary.id_skill
)
WHERE rn < 11;
我了解到此操作可以称为逐组最大值,并且我已经找到了一些答案。我无法复制它们中的任何一个,如果有任何帮助,我特别需要mysql Ver 14.14 Distrib 5.5.60, for Linux (x86_64) using readline 5.
使用(我尝试了对某些其他类似数据库来说是完美的答案,但在mysql中不起作用)。 / p>
表的定义:
CREATE TABLE `ternary` (
`id_offer` varchar(200) NOT NULL,
`id_skill` varchar(200) NOT NULL,
`id_profile` varchar(200) NOT NULL,
`ranking_skill` double NOT NULL,
PRIMARY KEY (`id_offer`,`id_skill`,`id_profile`),
KEY `id_skill` (`id_skill`),
KEY `id_profile` (`id_profile`),
CONSTRAINT `ternary_ibfk_1` FOREIGN KEY (`id_offer`) REFERENCES `offer` (`id_offer`),
CONSTRAINT `ternary_ibfk_2` FOREIGN KEY (`id_skill`) REFERENCES `skill` (`id_skill`),
CONSTRAINT `ternary_ibfk_3` FOREIGN KEY (`id_profile`) REFERENCES `profile` (`id_profile`)
)
CREATE TABLE `skill` (
`id_skill` varchar(200) NOT NULL,
`name_skill` varchar(200) DEFAULT NULL,
`date` date DEFAULT NULL,
PRIMARY KEY (`id_skill`)
)
进行
的结果select * from ternay limit 10;
+------------+------------+-----------+----------------------+
| id_oferta | id_skill | id_perfil | ranking_skill |
+------------+------------+-----------+----------------------+
| 1004 | 107 | 679681082 | 0 |
| 1004 | 115 | 679681082 | 0.10846866454897801 |
| 1004 | 117 | 679681082 | 0.038003619695992294 |
| 1004 | 129 | 679681082 | 0.04987975085098989 |
| 1004 | 147 | 679681082 | 0.02771097269499438 |
| 1004 | 299 | 679681082 | 0.0522549770819894 |
| 1004 | 321 | 679681082 | 0.11955305362697576 |
| 1004 | 417 | 679681082 | 0.11321911701097703 |
| 1004 | 964 | 679681082 | 0.015043099462996949 |
| 1004 | 967 | 679681082 | 0.05304671915898924 |
+------------+------------+-----------+----------------------+
上面描述的查询1)的结果使我的一个配置文件排名前10位
+------------+--------------+---------------------+
| id_skill | name_skill | ranking_skill |
+------------+--------------+---------------------+
| 109 | scala | 0.3089840175329823 |
| 122 | hadoop | 0.24164146109602963 |
| 9731 | python | 0.21470443852124863 |
| 325 | java | 0.18776741594646754 |
| 114 | sql | 0.14736188208429596 |
| 101 | kafka | 0.13389337079690544 |
| 301 | bbdd | 0.13389337079690544 |
| 927 | agile | 0.13389337079690544 |
| 320 | hive | 0.1204248595095149 |
| 109 | spark | 0.1204248595095149 |
+------------+--------------+---------------------+
答案 0 :(得分:1)
以下是您在没有 Window Functions 的情况下制作Row_number()
的示例,您可以尝试在select
子句上编写子查询。
PARTITION BY
列写条件。count(*)
组成Row_number
看起来像这样。
SELECT * FROM
(
SELECT *,(
select (count(*) + 1) rn
from ternary
where
t.id_profile = id_profile and
t.id_profile = id_profile and
ranking_skill > t.ranking_skill
) rn
FROM ternary t
) t
WHERE rn < 11
order by rn
sqlfiddle:http://sqlfiddle.com/#!9/7ee529/9
此查询可能是您可以尝试的工作。
SELECT *
FROM
(
SELECT DISTINCT t.id_skill,
skill.name_skill,
t.ranking_skill,
t.id_profile,
(
select (count(*) + 1) rn
from ternary
where t.id_profile = id_profile and t.id_profile = id_profile
and ranking_skill > t.ranking_skill
) rn
FROM ternary t
INNER JOIN skill ON skill.id_skill=t.id_skill;
)
WHERE rn < 11;
答案 1 :(得分:1)
要加快第一个查询的速度,请更改
KEY `id_profile` (`id_profile`),
到
KEY `id_profile` (`id_perfil`, id_skill, id_ranking),
请勿混用DISTINCT
和GROUP BY
。 (GroupBy有效地做到了与众不同。)
nombre_profile
来自哪里? (当列名称悬空时很难提供帮助。)
延迟获取skill.name_skill
。
如果不使用ranking_skill
,请不要将JOIN
从子查询中传递出去。
将SELECT t.id_profile,
nombre_profile,
( SELECT COUNT(DISTINCT id_skill)
FROM ternary
WHERE id_skill = ten.id_skill
) AS matching
FROM
( -- Get the 10 ids:
SELECT t.id_skill
FROM ternary AS t
INNER JOIN skill ON skill.id_skill = t.id_skill
WHERE t.id_profile = #IntNumber#
GROUP BY t.id_skill
ORDER BY t.ranking_skill DESC
LIMIT 10
) AS ten
INNER JOIN profile AS p ON t.id_profile = p.id_profile AS p
GROUP BY(t.id_profile)
ORDER BY matching DESC;
中的一个移动到子查询中。
也许具有正确组合两个查询的作用:
texts = driver.find_elements_by_xpath("//div[@class='card-block cms']")