tb_content
(左)和tb_word
(右):
===================================== ================================
|id|sentence |sentence_id|content_id| |id|word|sentence_id|content_id|
===================================== ================================
| 1|sentence1| 0 | 1 | | 1| a | 0 | 1 |
| 2|sentence2| 1 | 1 | | 2| b | 0 | 1 |
| 3|sentence5| 0 | 2 | | 3| c | 1 | 1 |
| 4|sentence6| 1 | 2 | | 4| a | 1 | 1 |
| 5|sentence7| 2 | 2 | | 5| e | 1 | 1 |
===================================== | 6| f | 0 | 2 |
| 7| g | 1 | 2 |
| 8| h | 1 | 2 |
| 9| i | 1 | 2 |
|10| f | 2 | 2 |
|11| h | 2 | 2 |
|12| f | 2 | 2 |
================================
我需要检查每个句子是否包含每个content_id
中其他句子所拥有的单词。
例如:
检查content_id
= 1
sentence1
和sentence2
。从tb_word
开始,我们可以看到sentence1
和sentence2
由相同的单词a
组成。如果两个句子中a
的数量为>=2
,那么a
就是结果。因此,如果我打印结果,它必须是:
00Array ( [0] => a [1] => b) 01Array ( [3] => a ) 10Array ( [3] => a )11Array ( [0] => c [1] => a [2] => e)
00
表示sentence_id
= 0
和sentence_id
= 0
首先,我让functionTotal
计算每个sentence
所拥有的content_id
的数量:
$total = array();
$sql = mysql_query('select content_id, count(*) as RowAmount
from tb_content Group By contente_id') or die(mysql_error());
while ($row = mysql_fetch_array($sql)) {
$total[] = $row['RowAmount'];
}
return $total;
从该函数我得到$total
的值,从中我需要检查2 tb_word
foreach ($total as $content_id => $totals){
for ($x=0; $x <= ($totals-1); $x++) {
for ($y=0; $y <= ($totals-1); $y++) {
$shared = getShared($x, $y);
}
}
getShared
的功能是:
function getShared ($x, $y){
$token = array();
$shared = array();
$i = 0;
if ($x == $y) {
$query = mysql_query("SELECT word FROM `tb_word`
WHERE sentence_id ='$x' ");
while ($row = mysql_fetch_array($query)) {
$shared[$i] = $row['word'];
$i++;
}
} else {
$query = mysql_query("SELECT word, count(word) as jml
FROM `tb_word` WHERE sentence_id ='$x'
OR sentence_id ='$y'
GROUP BY word ");
while ($row = mysql_fetch_array($query)) {
$jml = $row['jml'];
$token[$i] = $row['word'];
if ($jml >= 2) {
$shared[$i] = $token[$i];
}
$i++;
}
但我得到的结果仍然是错误的。结果仍然在不同的content_id
之间混合。结果必须由content_id
分组。抱歉我的英语不好,我的解释也不好。 cmiiw,请帮帮我..谢谢:)
答案 0 :(得分:1)
如何简单SELECT content_id, word, COUNT(*) as num_appearing FROM tb_word GROUP BY content_id, word
?
编辑:我现在看到了复杂性:您的主要问题是getShared()
函数传递了两个句子ID,但没有content_id
知道哪个内容是被分析。您还假设content_id
和sentence_id
数字是连续的并且从零开始。我的代码没有假设,并直接从数据库中提取这些ID。
<?php
$rs = mysql_query("SELECT * FROM tb_content");
$content = array();
while ($row = mysql_fetch_assoc($rs)) {
if (!isset($content[$row['content_id']])) $content[$row['content_id']] = array();
$content[$row['content_id']][] = $row['sentence_id'];
}
foreach($content as $content_id => $sentences) {
foreach($sentences as $sentence_id) {
foreach($sentences as $compare) {
$shared = getShared($content_id, $sentence_id, $compare);
}
}
}
function getShared($cid, $s1, $s2) {
$rs = mysql_query("SELECT `word`, COUNT(*) AS 'num' FROM `tb_word` WHERE `content_id`={$cid} AND `sentence_id` IN ({$s1}, {$s2}) GROUP BY `word`");
$out = array();
while ($row = mysql_fetch_assoc($rs)) {
if ($rs['num'] >= 2) $out[$rs['word']] = $rs['num'];
}
return $out;
}
答案 1 :(得分:1)
这个实际上可以由DBMS本身完成,一个查询中有两个步骤。首先,您进行自我联接以准备相同内容中的句子组合:
SELECT a.content_id,
a.sentence_id AS sentence_id_1,
b.sentence_id AS sentence_id_2
FROM tb_content AS a
JOIN tb_content AS b
ON ( a.content_id = b.content_id
AND a.sentence_id <= b.sentence_id )
“&lt; =”将保持相同的句子连接,如“1-1”或“2-2”,但避免双向重复,如“1-2”和“2-1”。接下来,您可以使用单词加入上述结果并计算出现次数。像那样:
SELECT s.content_id,
s.sentence_id_1,
s.sentence_id_2,
c.word,
Count(*) AS jml
FROM (SELECT a.content_id,
a.sentence_id AS sentence_id_1,
b.sentence_id AS sentence_id_2
FROM tb_content AS a
JOIN tb_content AS b
ON ( a.content_id = b.content_id
AND a.sentence_id <= b.sentence_id )) AS s
JOIN tb_word AS c
ON ( s.content_id = c.content_id
AND ( c.sentence_id = s.sentence_id_1
OR c.sentence_id = s.sentence_id_2 ) )
GROUP BY s.content_id,
s.sentence_id_1,
s.sentence_id_2,
c.word
HAVING Count(*) >= 2;
上述查询的结果将为您提供容器,句子1和2,单词和出现次数(2或更多)。您现在需要的只是将结果收集到数组中,正如我所知道的那样。
如果我错过了你的目标,请告诉我。