如何分组并将组ID分配给表中列的重复值?

时间:2012-04-18 19:42:59

标签: mysql sql group-by where

我的问题涉及使用SQL通过使用脚本为组ID分配多组重复值。我已经手工做了一点,并意识到,随着数据库的扩展(几千个元素),它需要很长时间。

这是我的数据库结构:

id  | db quesition         | db keywords           | answer id  | db answer                    |
------------------------------------------------------------------------------------------------
 0  | Why is Mars red?     | [why,mars,red]        | 0          | Mars is red because blah     |

 1  | How is Mars red?     | [how,mars,red]        | 0          | Mars is red because blah     |

 2  | What makes Mars red? | [what,makes,mars,red] | 0          | Mars is red because blah     |

 3  | Is Mars very rocky?  | [is,mars,rocky]       | 0          | Yes Mars is rocky blahbla    |

 4  | Does Mars have rocks?| [mars,have,rocks]     | 0          | Yes Mars is rocky blahbla    |

 5  | What is the Sun?     | [what,is,sun]         | 0          | The Sun is our solar blah    |

 6  | What is a star?      | [what,is,star]        | 0          | A star is a ball of hot blah |

现在,正如您所看到的,一个答案可能有多个问题,因此数据库将在db_answer列中包含重复项。我希望每个db_answer都有一个单数answer_id,如果答案被多次使用,将会重复id | db quesition | db keywords | answer id | db answer | ----------------------------------------------------------------------------------------------- 0 | Why is Mars red? | [why,mars,red] | 1 | Mars is red because blah | 1 | How is Mars red? | [how,mars,red] | 1 | Mars is red because blah | 2 | What makes Mars red? | [what,makes,mars,red] | 1 | Mars is red because blah | 3 | Is Mars very rocky? | [is,mars,rocky] | 2 | Yes Mars is rocky blahbla | 4 | Does Mars have rocks?| [mars,have,rocks] | 2 | Yes Mars is rocky blahbla | 5 | What is the Sun? | [what,is,sun] | 3 | The Sun is our solar blah | 6 | What is a star? | [what,is,star] | 4 | A star is a ball of hot blah | 。为了说明,我希望我的数据库看起来像:

UPDATE elements SET answer_id = '1' WHERE db_answer = 'Mars is red because blah' 

我已经广泛查看了执行此操作的脚本,但没有任何运气。就像一个显示我一直在尝试做的事情的说明,我一直在为每个答案组使用SQL,我想添加一个id:

{{1}}

3 个答案:

答案 0 :(得分:3)

使用PHP脚本非常简单:

$query = mysql_query("SELECT DISTINCT db_answer FROM elements");
$i = 1;
while ($row = mysql_fetch_row($query))
{
    mysql_query("UPDATE elements SET answer_id = {$i} WHERE db_answer = '{$row[0]}'");
    $i++;
}

但是,我认为将答案存储在单独的表中并将answer_id保留在elements表中可能是明智之举。这样就可以避免不必要地重复信息。

<小时/> 编辑:

正如@mdoyle建议的那样,我认为最好使用四个表:

CREATE TABLE questions (
    questionID INT NOT NULL AUTO_INCREMENT,
    question VARCHAR(128),
    answerID INT,
    PRIMARY KEY (questionID),
    FOREIGN KEY (answerID) REFERENCES answers (answerID)
);

CREATE TABLE answers (
    answerID INT NOT NULL AUTO_INCREMENT,
    answer VARCHAR(128),
    PRIMARY KEY (answerID)
);

CREATE TABLE keywords (
    keywordID INT NOT NULL AUTO_INCREMENT,
    keyword VARCHAR(16),
    PRIMARY KEY (keywordID)
);

CREATE TABLE question_keywords (
    questionID INT,
    keywordID INT,
    FOREIGN KEY (questionID) REFERENCES questions (questionID),
    FOREIGN KEY (keywordID) REFERENCES keywords (keywordID)
);

answers表与questions表之间的关系是一对多一个答案可能适用于很多问题),所以你有两张桌子。这假设每个问题只能有一个答案。如果不是这种情况,并且一个问题可能有两个可接受的答案,则关系变为多对多(继续阅读如何为多个人设置表格多对多的关系)。

questions表和keywords表之间的关系是多对多很多问题可能会使用很多关键字),所以你有三个表。一个问题(每个问题一行),一个包含关键字(每个关键字一行),第三个将两者联系在一起。 question_keywords表将具有多个具有相同questionID的行和具有相同keywordID的多个行。因此,如果questionID 5有三个关键字,那么question_keywords表中将有三个条目,其questionID为5。

对于任何一对一关系,您通常只需在同一个表中创建一个附加列,因此您将拥有一个用于该关系的表。

注意:您可以随意更改VARCHAR列的长度。根据您的示例,我选择了可​​能没问题的值,但如果问题和/或答案可能更长,那么您可能需要增加此大小。


创建这些表后,您可以通过执行以下操作来填充它们:

$query = $mysql_query("SELECT * FROM elements") or die(mysql_error());
echo "About to enter while-loop<br />";
$i = 1;
while ($row = mysql_fetch_assoc($query))
{
    echo "loop ". $i++ ."<br />";
    $answerID = -1;

    $querystr = "SELECT answerID FROM answers WHERE answer = '{$row["db_answer"]}'";
    echo "Getting answerID. query: {$querystr}<br />";
    $query = mysql_query($querystr) or die($mysql_error());
    if (!(list($answerID) = mysql_fetch_row($query)))
    {
        $querystr = "INSERT INTO answers (answer) VALUES ('{$row["db_answer"]}')";
        echo "Answer did not exist, inserting now. query: {$querystr}<br />";
        mysql_query($querystr) or die(mysql_error());
        $answerID = mysql_insert_id();
    }

    $querystr = "INSERT INTO questions (questionID, question, answerID) VALUES ('{$row["id"]}', '{$row["db_question"]}', '{$answerID}')";
    echo "Inserting question. query: {$querystr}<br />";
    mysql_query($querystr) or die(mysql_error());

    $keywords = explode(",", trim($row["db_keywords"], "[]"));
    echo "keywords = ". print_r($keywords, true) ."<br />";
    foreach ($keywords as $keyword)
    {
        $keywordID = -1;
        $querystr = "SELECT keywordID FROM keywords WHERE keyword = '{$keyword}'";
        echo "Getting keywordID. query: {$querystr}<br />";
        $query = mysql_query($querystr) or die(mysql_error());
        if (!(list($keywordID) = mysql_fetch_row($query)))
        {
            $querystr = "INSERT INTO keywords (keyword) VALUES ('{$keyword}')";
            echo "Keyword did not exist, inserting now. query: {$querystr}<br />";
            mysql_query($querystr) or die(mysql_error());
            $keywordID = mysql_insert_id();
        }

        $querystr = "INSERT INTO question_keywords (questionID, keywordID) VALUES ('{$row["id"]}', '{$keywordID}')";
        echo "Inserting question keyword. query: {$querystr}<br />";
        mysql_query($querystr) or die(mysql_error());
    }
}

完成此操作并验证正确填充了四个表后,您根本不再需要使用elements表。只需使用这四个表格questionsanswerskeywordsquestion_keywords)。

答案 1 :(得分:2)

在mysql的限制范围内,您可以将答案分配给答案:

select answer, min(id) as answer_id
from table
group by answer

因此,完整的解决方案是在表格中创建一个answerid列,然后执行以下操作:

with aid as 
(
  select answer, min(id) as answer_id
  from table
  group by answer
)
update table
set answer_id = aid.answer_id
where table.answer = aid.answer

答案 2 :(得分:1)

在查询中需要执行此操作的是SQL Server函数ROW_NUMBER()。不幸的是,MySQL没有这个。但是,您可以通过利用内联的变量赋值来模拟函数。这篇文章解释了所涉及的逻辑:http://www.explodybits.com/2011/11/mysql-row-number/