在REGEXP中使用NOT REGEXP会弄乱我的结果

时间:2016-12-20 12:09:23

标签: php mysql regex

在我的数据库中,我有一张公司表。此表有一个名为tags的字段,其中包含以下内容:

  

家具零售电子商务B2C Home&家具消费者自主家具英国制造商零售商当代复古家具产品设计

我希望能够做的是查询这些标签并根据该字段是否包含用户可能输入的任何关键字或短语来返回公司。

例如,如果用户想要找到其代码包含单词Retail的公司,则会生成如下查询:

SELECT
    company.domain,
    company.company_name,
    CONCAT_WS(
        ',',
        company.business_sector,
        company.tags
    ) AS 'tags',
    GROUP_CONCAT(
        employee.employee_id SEPARATOR ','
    ) AS 'employee_ids',
    COUNT(employee.employee_id) AS 'employees'
FROM
    company
INNER JOIN employee ON company.domain = employee.domain
WHERE
    company.tags REGEXP '^Retail| Retail |Retail$'
OR company.business_sector LIKE '%Retail%'
AND company.domain NOT IN (
    '@hotmail.com',
    '@gmail.com',
    '@aol.com'
)
GROUP BY
    company.domain

这个确切的查询会返回 11424 结果,这很棒。

现在它落入的部分是用户输入不应该在此字段内的关键字。

所以,让我们说我们不想要任何Apparel,它会生成此查询:

SELECT
    company.domain,
    company.company_name,
    CONCAT_WS(
        ',',
        company.business_sector,
        company.tags
    ) AS 'tags',
    GROUP_CONCAT(
        employee.employee_id SEPARATOR ','
    ) AS 'employee_ids',
    COUNT(employee.employee_id) AS 'employees'
FROM
    company
INNER JOIN employee ON company.domain = employee.domain
WHERE
    company.tags REGEXP '^Retail| Retail |Retail$'
OR company.business_sector LIKE '%Retail%'
AND (
    company.tags NOT REGEXP '^Apparel| Apparel |Apparel$'
    AND company.business_sector NOT LIKE '%Apparel%'
)
AND company.domain NOT IN (
    '@hotmail.com',
    '@gmail.com',
    '@aol.com'
)
GROUP BY
    company.domain

这个确切的查询会返回 112 结果,但绝对不是这种情况,因为我的数据库中没有 11312 公司,其关键字为Apparel。< / p>

关于我做错的任何想法

修改

这是重复...我可以修改我的查询,但这不是问题的所在。

例如,让我们从Retail获取那些 11424 结果,然后输入一个随机的短语我们知道永远不会出现在任何结果中,我们应该得到相同的 11424 记录:

SELECT
    company.domain,
    company.company_name,
    CONCAT_WS(
        ',',
        company.business_sector,
        company.tags
    ) AS 'tags',
    GROUP_CONCAT(
        employee.employee_id SEPARATOR ','
    ) AS 'employee_ids',
    COUNT(employee.employee_id) AS 'employees'
FROM
    company
INNER JOIN employee ON company.domain = employee.domain
WHERE
    (
        company.tags REGEXP '^Retail| Retail |Retail$'
        OR company.business_sector LIKE '%Retail%'
    )
AND (
    company.tags NOT REGEXP '^This phrase will never occur| This phrase will never occur |This phrase will never occur$'
    AND company.business_sector NOT LIKE '%This phrase will never occur%'
)
AND company.domain NOT IN (
    '@hotmail.com',
    '@gmail.com',
    '@aol.com'
)
GROUP BY
    company.domain

我没有获得 11424 ,而是获得 135 记录。怎么样?

3 个答案:

答案 0 :(得分:0)

你真的应该将数据标准化,将标签存储在一个单独的表中,这样你就不必像这样做超复杂的逻辑。

与此同时,您的问题在于布尔组。 AND优先于OR,因此您的查询应该是

SELECT
    company.domain,
    company.company_name,
    CONCAT_WS(
        ',',
        company.business_sector,
        company.tags
    ) AS 'tags',
        GROUP_CONCAT(
            employee.employee_id SEPARATOR ','
        ) AS 'employee_ids',
        COUNT(employee.employee_id) AS 'employees'
FROM
    company
INNER JOIN employee ON company.domain = employee.domain
WHERE
    (company.tags REGEXP '^Retail| Retail |Retail$'
    OR company.business_sector LIKE '%Retail%')
AND company.tags NOT REGEXP '^Apparel| Apparel |Apparel$'
AND company.business_sector NOT LIKE '%Apparel%'
AND company.domain NOT IN (
    '@hotmail.com',
    '@gmail.com',
    '@aol.com'
)
GROUP BY
    company.domain

密切注意括号位置。

答案 1 :(得分:0)

在这些情况下,您需要使用(甚至过度使用)括号来构造OR子句的ANDWHERE s。最好在过滤表达式中拼出你想要的结合。

尝试这样的事情

WHERE  (
             company.tags REGEXP '^Retail| Retail |Retail$'
          OR company.business_sector LIKE '%Retail%')
AND NOT (
             company.tags REGEXP '^Apparel| Apparel |Apparel$'
         OR company.business_sector LIKE '%Apparel%'
)
AND NOT company.domain IN (
    '@hotmail.com',
    '@gmail.com',
    '@aol.com'
)

答案 2 :(得分:0)

我找到了一些现在很完美的东西,我正在使用MATCH AGAINST全文搜索方法:

SELECT
    company.domain,
    company.company_name,
    CONCAT_WS(
        ',',
        company.business_sector,
        company.tags
    ) AS 'tags',
    GROUP_CONCAT(
        employee.employee_id SEPARATOR ','
    ) AS 'employee_ids',
    COUNT(employee.employee_id) AS 'employees',
    COUNT(ct_connections.id) AS 'already_connected'
FROM
    company
INNER JOIN employee ON company.domain = employee.domain
LEFT JOIN ct_connections ON employee.email = ct_connections.email
AND ct_connections.client_id = 1
WHERE
    (
        MATCH (company.tags) AGAINST ('Retail')
        OR company.business_sector LIKE '%Retail%'
    )
AND (
    NOT MATCH (company.tags) AGAINST ('Apparel')
    AND company.business_sector NOT LIKE '%Apparel%'
    AND NOT MATCH (company.tags) AGAINST ('Footwear')
    AND company.business_sector NOT LIKE '%Footwear%'
)
AND company.domain NOT IN (
    '@hotmail.com',
    '@gmail.com',
    '@aol.com'
)
GROUP BY
    company.domain