Question

我正在寻找一些要过滤的职位表。这些表之一是“主”表，它是一对一的信息，包括工作ID，位置和薪水。另一个表是“标签”表，它是一对多表，具有多个与之相关的“标签”（例如教育，经验，软技能和硬技能）。请注意，有数百万个工作ID，因此还有更多标签。

“主”表

╔══════╦═══════════╦════════╦
║ id   ║  location ║salary  ║
╠══════╬═══════════╬════════╬
║  zy3 ║ CA        ║100,000 ║
║  w1e ║ TX        ║150,000 ║
║  sr2 ║ UT        ║200,000 ║
║  hi9 ║ NY        ║130,000 ║
╚══════╩═══════════╩════════╩

“标签”表

╔══════╦════════╦
║ id   ║  tag   ║ 
╠══════╬════════╬
║  zy3 ║ Python ║
║  zy3 ║ Hadoop ║
║  zy3 ║ master ║
║  w1e ║ Hadoop ║ 
║  w1e ║ BS     ║
║  w1e ║ junior ║ 
║  sr2 ║ Hadoop ║ 
║  sr2 ║ Tech   ║
║  sr2 ║ Stats  ║ 
║  hi9 ║ Java   ║ 
║  hi9 ║ Spark  ║ 
║  hi9 ║ GCP    ║
║  hi9 ║ MS     ║ 
╚══════╩════════╩

我想对“主”表进行子集处理，以仅包括例如包含以下两个或多个标记的角色：

Python，Hadoop，Java，Spark

因此，新的主表将如下所示：

╔══════╦═══════════╦════════╦
║ id   ║  location ║salary  ║
╠══════╬═══════════╬════════╬
║  zy3 ║ CA        ║100,000 ║
║  hi9 ║ NY        ║130,000 ║
╚══════╩═══════════╩════════╩

我正在考虑引入另一个表，该表将包含我将接受的不同标签的列表。

在某种程度上，我已经能够对解决方案进行硬编码，但这确实对计算有要求，特别是因为我正在处理数百万行，有时我会匹配许多潜在的标签（在这种情况下，它只是四个感兴趣的标签）。以下是我使用的代码。

select * from master t0 
where (select count(id) from 
((select id from master t1 where t1.id=t0.id and exists (select 1 from tags t2 where t1.id=t2.id and t2.tag='Python'))
union
(select id from master t1 where t1.id=t0.id and exists (select 1 from tags t2 where t1.id=t2.id and t2.tag='Hadoop'))
union
(select id from master t1 where t1.id=t0.id and exists (select 1 from tags t2 where t1.id=t2.id and t2.tag='Java'))
union
(select id from master t1 where t1.id=t0.id and exists (select 1 from tags t2 where t1.id=t2.id and t2.tag='Spark'))) tx) 
>=2;

Answer 1

您不需要UNION，只需使用WHERE tag in (<list of specified tags>)。然后将其与master表连接。

SELECT m.*
FROM master AS m
JOIN (
    SELECT id
    FROM tags
    WHERE tag in ('python', 'hadoop', 'java', 'spark')
    GROUP BY id
    HAVING COUNT(*) >= 2
) AS t ON m.id = t.id

SQL中是否有命令在列表中选择2个或更多匹配项？

1 个答案: