MySQL Query整合来自3个表的信息(有很多障碍)

时间:2013-07-05 19:31:27

标签: mysql

背景:在实验中,蜜蜂在背上粘贴数字标签,并在实验室中记录它们的选择。没有足够的数字标签(2位数和几个颜色选项),他们需要重复使用。但是,标签仅在携带它的人死后才能重复使用。因此,在数据结构中,我们偶尔会看到蜜蜂标识符,但是知道它是否来自同一只蜜蜂的唯一方法是查看另一张表以查看蜜蜂是否死亡。

表格: 蜜蜂的选择

CREATE TABLE `exp8` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `bee_id` varchar(255) DEFAULT NULL,
  `date_time` datetime DEFAULT NULL,
  `choice` varchar(255) DEFAULT NULL,
  `hover_duration` int(11) DEFAULT NULL,
  `antennate_duration` int(11) DEFAULT NULL,
  `land_duration` int(11) DEFAULT NULL,
  `landing_position` varchar(255) DEFAULT NULL,
  `remarks` longtext,
  `validity` int(11) DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=264;

LOCK TABLES `exp8` WRITE;
/*!40000 ALTER TABLE `exp8` DISABLE KEYS */;

INSERT INTO `exp8` (`id`, `bee_id`, `date_time`, `choice`, `hover_duration`, `antennate_duration`, `land_duration`, `landing_position`, `remarks`, `validity`)
VALUES
    (1,NULL,'2013-05-14 15:38:31','right',1,0,0,NULL,NULL,1),
    (2,NULL,'2013-05-18 10:27:15','left',1,0,0,NULL,NULL,1),
    (3,'G5','2013-05-18 11:44:44','left',0,0,4,'yellow',NULL,1),
    (4,'G5','2013-06-01 10:00:00','left',0,0,4,'yellow',NULL,1);

出生和死亡标签的时间

CREATE TABLE `tags` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `bee_id` varchar(255) DEFAULT NULL,
  `tag_date` date DEFAULT NULL,
  `colony_id` int(11) DEFAULT NULL,
  `events` varchar(255) DEFAULT NULL,
  `worker_age` varchar(255) DEFAULT NULL,
  `tagged_by` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`id`)
) TYPE=InnoDB AUTO_INCREMENT=406;

LOCK TABLES `tags` WRITE;
/*!40000 ALTER TABLE `tags` DISABLE KEYS */;

INSERT INTO `tags` (`id`, `bee_id`, `tag_date`, `colony_id`, `events`, `worker_age`, `tagged_by`)
VALUES
    (1,'G5','2013-05-08',1,'birth','Adult','ET'),
    (2,'G5','2013-05-20',NULL,'death','Adult','ET'),
    (3,'G5','2013-05-29',1,'birth','Adult','ET');

实验室中显示的刺激

CREATE TABLE `stimuli_schedule` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `left_side` varchar(255) DEFAULT NULL,
  `right_side` varchar(255) DEFAULT NULL,
  `start_datetime` datetime DEFAULT NULL,
  `scheduled` datetime DEFAULT NULL,
  PRIMARY KEY (`id`)
) TYPE=InnoDB AUTO_INCREMENT=50;

LOCK TABLES `stimuli_schedule` WRITE;
/*!40000 ALTER TABLE `stimuli_schedule` DISABLE KEYS */;

INSERT INTO `stimuli_schedule` (`id`, `left_side`, `right_side`, `start_datetime`, `scheduled`)
VALUES
    (1,'LS1','LS2','2013-05-14 12:00:00',NULL),
    (2,'LS2','LS1','2013-05-15 11:44:00',NULL),
    (3,'LS1','LS2','2013-05-30 11:09:00',NULL);

所需的输出是这样的:

bee_id     CHOICE_DATETIME     LEFT_SIDE     RIGHT_SIDE     CHOICE
===================================================================
NULL       2013-05-14 15:38:31     LS1          LS2           right
G5         2013-05-18 10:27:15     LS2          LS1           left
G5         2013-06-01 10:00:00     LS1          LS2           left

感谢@GordonLinoff和@jcsanyi的慷慨帮助,有两个相关的MySQL查询可以实现部分解决方案:

这个位显示每个蜜蜂的选择,假设蜜蜂的ID是唯一的:

select bee_id, count(case when choice="left" then 1 else NULL end) as leftCount, count(case when choice="right" then 1 else NULL end) as rightCount
  from exp8 e
  left join stimuli_schedule ss on ss.start_datetime <= e.date_time
  left join stimuli_schedule ss2 on ss2.start_datetime <= e.date_time
  where (bee_id IS NOT NULL) AND (ss2.left_side IN ('LA1','HS1') AND ss2.right_side IN('HS1','LA1'))
  group by bee_id

这个位能够显示蜜蜂的生命长度,并区分重复使用的标签:

select t.bee_id, (case when t.death_date is null then 'Alive' else 'Dead' end) as status, 
        t.tag_date, t.death_date, (case when t.death_date is not null then timediff(t.death_date,t.tag_date) else timediff(NOW(),t.tag_date) end) as age
from (select t.*,
             (select t2.tag_date
              from tags t2
              where t2.bee_id = t.bee_id and
                    t2.events = 'death' and
                    t2.tag_date >= t.tag_date
              limit 1
             ) as death_date
      from tags t
      where t.events = 'birth'
     ) t
group by t.bee_id, t.tag_date;

我无法将两个查询组合在一起以产生所需的输出。这是我的尝试:

select t.bee_id, count(case when choice="left" then 1 else NULL end) as leftCount,
       count(case when choice="right" then 1 else NULL end) as rightCount, 
       (case when t.death_date is null then 'Alive' else 'Dead' end) as status, 
       t.tag_date, t.death_date, 
       (case when t.death_date is not null 
             then timediff(t.death_date,t.tag_date) 
             else timediff(NOW(),t.tag_date) end) as "age (hours)"
from exp8 e, (select t.*,
             (select t2.tag_date
              from tags t2
              where t2.bee_id = t.bee_id and
                    t2.events = 'death' and
                    t2.tag_date >= t.tag_date
              limit 1
             ) as death_date
      from tags t
      where t.events = 'birth'
     ) t
left join stimuli_schedule ss on ss.start_datetime <= e.date_time
left join stimuli_schedule ss2 on ss2.start_datetime <= e.date_time
where (e.bee_id IS NOT NULL)
group by t.bee_id, t.tag_date;

由于我理解的原因,左e.date_time部分导致“未知列”错误。

非常感谢任何帮助!

2 个答案:

答案 0 :(得分:1)

现在它的方式JOIN运算符与派生表t相关,而不是像你明显想要的那样与exp8相关。这就是你通过混合两种不同的连接语法得到的。你也想在bee_id上​​加入t到exp8,我猜想。

答案 1 :(得分:0)

您的问题更多地出现在数据库设计中。行为归于蜜蜂。那只蜜蜂需要被唯一识别。因此,需要蜜蜂的主键,您可以根据蜜蜂ID编码行为。

诀窍是,当你处理标签时,你需要确定哪个蜜蜂当前有那个标签。使用列出当前部署的标记的表轻松完成。当蜜蜂死亡并且标签被重新分配或退役时,随后可以更新活动标签列表。

如果你能看到我的目标,你在数据分析阶段所做的选择过于复杂,因为他们试图模仿丢失的主键并不必要地将其应用于你的行为条目。更正设计,您的数据分析速度会快很多倍,查询也会简单得多。