带有自引用查询的mysql更新

时间:2012-03-30 08:06:41

标签: mysql self-join

我有一份调查表,其中包含(以及其他)以下列

survey_id  - unique id
user_id    - the id of the person the survey relates to
created    - datetime
ip_address - of the submission
ip_count   - the number of duplicates

由于记录集很大,因此动态运行此查询是不切实际的,因此尝试创建一个更新语句,该语句将在ip_count中定期存储“缓存”结果。

ip_count的目的是显示在12个月的时间段(创建日期的+/- 6个月)内收到的相同user_id的重复ip_address调查提交的数量。

使用以下数据集,这是预期的结果。

survey_id   user_id    created    ip_address     ip_count  #counted duplicates survey_id
  1            1      01-Jan-12   123.132.123       1      # 2
  2            1      01-Apr-12   123.132.123       2      # 1, 3
  3            2      01-Jul-12   123.132.123       0      # 
  4            1      01-Aug-12   123.132.123       3      # 2, 6
  6            1      01-Dec-12   123.132.123       1      # 4

这是我迄今为止提出的最接近的解决方案,但此查询未能考虑日期限制并努力想出一种替代方法。

UPDATE surveys
JOIN(
  SELECT ip_address, created, user_id, COUNT(*) AS total
  FROM surveys  
  WHERE surveys.state IN (1, 3) # survey is marked as completed and confirmed
  GROUP BY ip_address, user_id
) AS ipCount 
  ON (
    ipCount.ip_address = surveys.ip_address
    AND ipCount.user_id = surveys.user_id
    AND ipCount.created BETWEEN (surveys.created - INTERVAL 6 MONTH) AND (surveys.created + INTERVAL 6 MONTH)
  )
SET surveys.ip_count = ipCount.total - 1 # minus 1 as this query will match on its own id.
WHERE surveys.ip_address IS NOT NULL # ignore surveys where we have no ip_address

提前感谢您的帮助:)

2 个答案:

答案 0 :(得分:2)

我没有你的桌子,所以我很难形成一个肯定有用的正确的sql,但我可以为此拍摄,希望能够帮助你..

首先,我需要对自己进行调查的笛卡尔积,并过滤掉我不想要的行

select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)

此输出应包含匹配(根据您的规则)TWICE的每一对调查(一次针对第一个位置的每个id,一次针对它位于第二位置)

然后我们可以对此输出做一个GROUP BY来获得一个表,它基本上为每个survey_id提供了正确的ip_count

(select x, count(*) c from (select s1.survey_id x, s2.survey_id y from surveys s1, surveys s2 where s1.survey_id != s2.survey_id and s1.ip_address = s2.ip_address and (s1.created and s2.created fall 6 months within each other)) group by x)

现在我们有一个表将每个survey_id映射到正确的ip_count。要更新原始表,我们需要将其与此相关联,并通过

复制值

所以应该看起来像

UPDATE surveys SET s.ip_count = n.c from surveys s inner join (ABOVE QUERY) n on s.survey_id = n.x

那里有一些伪代码,但我认为一般的想法应该有用

我以前从未根据自己的其他查询的输出更新表格。试图从这个问题中猜出正确的语法 - How do I UPDATE from a SELECT in SQL Server?

此外,如果我需要为自己的工作做这样的事情,我不会尝试在一个查询中执行此操作。这将是一个难以维护并可能存在内存/性能问题。最好让脚本逐行遍历表,在事务中的单行更新,然后再转到下一行。更快,但更容易理解,可能更轻松的数据库。

答案 1 :(得分:2)

对上面显示的内容进行了一些(非常)的小调整。再次感谢你!

UPDATE surveys AS s
INNER JOIN (
  SELECT x, count(*) c
  FROM (
    SELECT s1.id AS x, s2.id AS y
    FROM surveys AS s1, surveys AS s2
    WHERE s1.state IN (1, 3) # completed and verified
      AND s1.id != s2.id # dont self join
      AND s1.ip_address != "" AND s1.ip_address IS NOT NULL # not interested in blank entries
      AND s1.ip_address = s2.ip_address
      AND (s2.created BETWEEN (s1.created - INTERVAL 6 MONTH) AND (s1.created + INTERVAL 6 MONTH))
      AND s1.user_id = s2.user_id # where completed for the same user
  ) AS ipCount
  GROUP BY x
) n on s.id = n.x
SET s.ip_count = n.c