更复杂的查询

Question

这是我的数据库模型：

enter image description here

我需要什么：

我需要输入几个术语并搜索包含所有这些术语的文档（document.text）。

示例数据：

文件：

id:1  text:dog cat train

id:2  text:dog cat train car

id:3  text:dog cat

id:4  text:dog

术语：

id:1 term:dog

id:2 term:cat

id:3 term:train

id:4 term:car

示例：

我想搜索包含所有字词的文档：dog cat train。结果将是文档1和文档2，但不是文档3，因为它没有train而不是文档4，因为它没有cat或train。

我的第一次尝试将是这样的查询：

select document.text from document join document_has_term on       
 document.iddocument=document_has_term.document_iddocument join term on
 term.idterm=document_has_term.term_idterm where term="kindness" and
 term="horrible"

此查询不会选择任何帖子，但它会反映我基本上想要的内容。

Answer 1

按您要选择的列进行分组，并仅选择具有这两个术语的列

function createMyObj() {

    return { 
        "4": 0,
        "6": 0,
        "8": 0,
        "10": 0,
        "12": 0,
        "20": 0,
        "100": 0
    };

}

Answer 2

假设您运行

，假设每个文档只能有一个术语

SELECT document_iddocument
    FROM document_has_term
    JOIN term ON (term_idterm = idterm)
    WHERE term IN ('cat', 'dog', 'train');

你将有三行，其中三个术语中的每一个都匹配，两行如果两个术语匹配，依此类推。

所以

SELECT document_iddocument
    FROM document_has_term
    JOIN term ON (term_idterm = idterm)
    WHERE term IN ('cat', 'dog', 'train')
GROUP BY document_iddocument
    HAVING COUNT(document_iddocument) = 3;

只会输出三个匹配的文档ID。

此查询甚至不需要在此阶段访问document。

您可以将其用作子SELECT来获取其iddocument在此ID列表中的文档：

SELECT document.text FROM document WHERE iddocument IN
( the above select );

更复杂的查询

如果您想运行更复杂的搜索，那么您可能应该使用MySQL查看文本搜索并使用FULLTEXT功能。

否则，您需要从＆＃34;外部＆＃34;开始构建查询。您指定

之类的语言

cat AND NOT dog

不是SQL，并将其转换为SQL查询。

一种有效的方法是尝试从诸如“但不是狗”的复杂查询中确定哪个组件是限制。在这个例子中，如果你有2000条记录，其中cat存在于100条记录中，而dog存在于除50条之外的所有记录中，则需要考虑： - 搜索术语的状态的查询非常有效。 - 搜索术语缺席的查询非常昂贵。

首先运行cat的查询，然后删除做包含狗的项目。

这种方法也很复杂。

另一种不太推荐用于大型数据库的可能性是扫描整个document_has_term表并查找所有文档的状态：

SELECT document_iddocument,
    SUM(IF(term = 'cat', 1, 0)) AS has_0,
    SUM(IF(term = 'dog', 1, 0)) AS has_1
    FROM document_has_term
    LEFT JOIN term ON (term_idterm = idterm AND term.term IN 
        ('cat', 'dog'))
GROUP BY document_iddocument;

此查询是使用某种外部语言构建的，由三部分组成：模板

SELECT document_iddocument,
    <OTHER_FIELDS>
    FROM document_has_term
    LEFT JOIN term ON (term_idterm = idterm AND term.term IN 
        <TERM_LIST>
GROUP BY document_iddocument;

是固定的;字段列表（每个术语一个）;一系列术语。查询越长，列表越长，成本线性增加。

现在你必须翻译你的＆＃34;文本查询＆＃34;进入一系列＆＃34;它在那里/它不是＆＃34;：

cat and not dog

变为

(has_0) and not (has_1)

实际上你可以将它集成到一个HAVING子句中，所以建立你的查询：

SELECT document.* FROM document
WHERE iddocument IN (

SELECT document_iddocument
    FROM document_has_term
    LEFT JOIN term ON (term_idterm = idterm AND term.term IN 
    ('cat', 'dog') -- list of all terms used
    )
GROUP BY document_iddocument

    HAVING
    (SUM(IF(term = 'cat', 1, 0))!=0) -- for the term "CAT"
    AND NOT                          -- from the "textual query"
    (SUM(IF(term = 'dog', 1, 0))!=0) -- for the term "DOG"
);

Here you can experiment with a little fiddle

只要您为文本查询使用SQLish语法，如果您小心SQL注入，Bob is your uncle。如果您不小心清理输入（仅允许有效的字词和关键字＆＃39;，＆＃39;或＆＃39;，＆＃39; not＆＃39;和括号，以及使用带有“占位符”的准备好的查询，然后Bobby可能就是你的daddy ......

在SQL数据库中搜索N到N关系

2 个答案:

更复杂的查询