Question

我想知道如何使用TRUE / FALSE值将几个数组值放入列名。我给你具体的例子：

我所拥有的是重复的行，由于结果不同，最后一列重复了：

DATE        ID    Species   Illness         Tag
20180101    001   Dog       Asthma          Mucus
20180101    001   Dog       Asthma          Noisy
20180101    001   Dog       Asthma          Respiratory
20180102    002   Cat       Osteoarthritis  Locomotor
20180102    002   Cat       Osteoarthritis  Limp
...
20180131    003   Bird      Avian Pox       Itchy

我想得到的是这个

DATE        ID    Species   Illness      Mucus  Noisy ... Limp  Itchy 
20180101    001   Dog       Asthma       TRUE   TRUE  ... FALSE FALSE
20180102    002   Cat       Osteoarth.   FALSE  FALSE ... TRUE  FALSE
...
20180131    003   Bird      Avian Pox    FALSE  FALSE ... FALSE TRUE

我只针对部分标签尝试了“交叉表”功能，但这给了我不存在的功能错误：

 select * 
 from crosstab (
   'select c.id, tg."name"  
    FROM taggings t 
    join consultations c
      on c.id=t.taggable_id
    join tags tg 
      on t.tag_id=tg.id
    group by c.id, tg."name"'
 ) as final_result(dermatological BOOLEAN, behaviour BOOLEAN)

顺便说一句。我大约有350个标签，所以它不是最佳功能：/

编辑：最后，我添加了tablefunc扩展名，并尝试使用crosstab（），但出现以下错误：

查询执行失败原因：SQL错误[22023]：错误：无效源数据SQL语句详细信息：提供的SQL必须返回3 列：rowid，类别和值。

我将尝试在此处找到解决方案并进行更新，但是与此同时，如果有人知道如何解决，请分享:)谢谢！

经过几天的阅读和尝试建议的解决方案，这对我有用：

我要做的是获取3个单独的表，然后将第一个和第三个表连接起来以获取所需的信息，如果标签存在于特定ID中，则将标签作为具有值1/0的列添加。再进行一次编辑=>我实际上并不需要日期，因此我将表格基于咨询ID。

表1： 获取所有需要按ID分组的列的表，并获取一个ID拥有的所有标签。

ID    Species   Age      Illness         Tag
001   Dog        2       Asthma          Mucus
001   Dog        2       Asthma          Noisy
001   Dog        2       Asthma          Respiratory
002   Cat        5       Osteoarthritis  Locomotor
002   Cat        5       Osteoarthritis  Limp
...
003   Bird       1       Avian Pox       Itchy

表2： 获取将跨所有咨询并带有所有不同标记列表的笛卡尔乘积，并对它们进行crosstab（）函数排序。（交叉表功能需要三列； ID，标签和值）

With consultation_tags as
    (here put the query of the TABLE 1),
tag_list as
    (select tags."name"
    from tags
    join taggings t on t.tag_id = tags.id
    join consultations c on c.id = t.taggable_id a
    group by 1), —-> gets the list of all possible tags in the DB 
cartesian_consultations_tags as
    (select consultations_tags.id, tag_list.name,
     case when tag_list.name = consultations_tags.tag_name then 1
     else 0  --> "case" gets the value 1/0 if the tag is present in an ID
     end as tag_exists
    from
    consultations_tags
    cross join 
    tag_list)
select cartesian_consul_tags.id, cartesian_consul_tags.name, 
SUM(cartesian_consul_tags.tag_exists) --> for me, the values were duplicated, and so were tags
from cartesian_consul_tags
group by 1, 2
order by 1, 2

—>标记的顺序在这里非常重要，因为您是交叉表函数中命名列的人。它不会将某些标签转换为列，而只会传递该标签位置的值，因此，如果您弄乱了命名顺序，则这些值将无法正确对应。

表3： 第二个表的交叉表—>枢转笛卡尔乘积表，在本例中为表2。

SELECT * 
FROM crosstab(‘ COPY THE TABLE 2 ‘) --> if you have some conditions like “where species = ‘Dogs’”, you will need to put double apostrophe in the string value —> where species = ‘’Dogs’’
AS ct(id int4,”Itchy” int8,
“Limp” int8,
“Locomotor” int8,
“Mucus” int8,
“Noisy” int8) --> your tag list. You can prepare it in excel, so all the tags are in quotation marks and has corresponding datatype. The datatype of the tags has to be the same as the datatype of the “value” in the table 2

最终，我想要的最终表是将表1和3联接在一起，所以我从咨询ID中获得了我需要的信息，以及一个标签列表，列的值为0/1如果该标签在某些咨询中存在。

with table1 as ( Copy the query of table1),
table3 as ( Copy the query of table3)
select *
from table1
join table3 on 
table1.id=table3.id 
order by 1

决赛桌如下：

ID    Species   Illness      Mucus  Noisy ... Limp  Itchy 
001   Dog       Asthma       1      1     ... 0     0
002   Cat       Osteoarth.   0      0     ... 1     0
...
003   Bird      Avian Pox    0      0     ... 0     1

Answer 1

我做了一些实验，这就是我想出的。

# Reading the data into a table

SELECT * INTO crosstab_test FROM 
(VALUES (20180101,'001','Dog','Asthma','Mucus'),
(20180101,'001','Dog','Asthma','Noisy'),
(20180101,'001','Dog','Asthma','Respiratory'),
(20180102,'002','Cat','Osteoarthritis','Locomotor'),
(20180102,'002','Cat','Osteoarthritis','Limp'),
(20180131, '003', 'Bird', 'Avian Pox','Itchy')) as a (date, id, species, illness, tag);

SELECT DISTINCT date, id, species, illness, mucus, noisy, locomotor, respiratory,  limp, itchy 
FROM 
(SELECT "date", id, species, illness
FROM crosstab_test) a
INNER JOIN             
(SELECT * FROM crosstab(
'SELECT id, tag, ''TRUE'' FROM crosstab_test ORDER BY 1,2,3',
'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1')
as tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)) b
USING(id)
ORDER BY 1;


   date   | id  | species |    illness     | mucus | noisy | locomotor | respiratory | limp | itchy
----------+-----+---------+----------------+-------+-------+-----------+-------------+------+-------
 20180101 | 001 | Dog     | Asthma         | TRUE  | TRUE  |           | TRUE        |      |
 20180102 | 002 | Cat     | Osteoarthritis |       |       | TRUE      |             | TRUE |
 20180131 | 003 | Bird    | Avian Pox      |       |       |           |             |      | TRUE
(3 Zeilen)

如果您不关心列的顺序，则可以执行SELECT DISTINCT * ...

考虑到您说的350个标签，用NULL替换FALSE可能会有些困难。所以我建议他们离开。如果您想要他们，可以SELECT DISTINCT date, id, species, illness, COALESCE(mucus, 'FALSE'), COALESCE(noisy, 'FALSE'),...

但是，您必须吞下的苦药是在交叉表语句的text中将所有350个标签指定为类型为as the tabelle (id text, Itchy text, Limp text, Locomotor text, Mucus text, Noisy text, Respiratory text)的列。务必在交叉表语句中按照'SELECT DISTINCT tag FROM crosstab_test ORDER BY 1'确定的顺序将它们放置正确。

希望这就是您想要的。

Answer 2

取决于显示查询结果的方式，您可以考虑采用另一种方法，即在单个JSONB列而不是350个动态列中为每个标记获取所有true / false标志。

我不确定我是否正确理解了您的数据模型，但是从我收集到的数据来看，我认为是这样的：

create table tags (id int, tag text);
create table consultations (id int, species text, illness text);
create table taggings (taggable_id int, tag_id int);

insert into tags 
  (id, tag)
values
  (1, 'Mucus'),
  (2, 'Noisy'),
  (3, 'Limp'),
  (4, 'Itchy'),
  (5, 'Locomotor'),
  (6, 'Respiratory');

insert into consultations
  (id, species, illness)
values 
  (1, 'Dog', 'Asthma'),
  (2, 'Cat', 'Osteoarthritis'),
  (3, 'Bird', 'Avian Pox');

insert into taggings 
  (taggable_id, tag_id)
values 
  (1, 1), (1, 2), (1, 6), -- the dog
  (2, 5), (2, 3), -- the cat 
  (3, 4); -- the bird

然后，您可以使用此查询获取单个JSON列：

select c.id, c.species, c.illness, 
       (select jsonb_object_agg(t.tag, tg.taggable_id is not null)
        from tags t 
          left join taggings tg 
                 on tg.tag_id = t.id 
                and tg.taggable_id = c.id) as tags
from consultations c;

使用上述示例数据，查询将返回：

id | species | illness        | tags                                                                                                    
---+---------+----------------+---------------------------------------------------------------------------------------------------------
 1 | Dog     | Asthma         | {"Limp": false, "Itchy": false, "Mucus": true, "Noisy": true, "Locomotor": false, "Respiratory": true}  
 2 | Cat     | Osteoarthritis | {"Limp": true, "Itchy": false, "Mucus": false, "Noisy": false, "Locomotor": true, "Respiratory": false} 
 3 | Bird    | Avian Pox      | {"Limp": false, "Itchy": true, "Mucus": false, "Noisy": false, "Locomotor": false, "Respiratory": false}

编写查询的另一种方法是使用横向联接：

select c.id, c.species, c.illness, ti.tags
from consultations c
  left join lateral (
    select jsonb_object_agg(t.tag, tg.taggable_Id is not null) as tags
    from tags t 
      left join taggings tg on tg.tag_id = t.id and tg.taggable_id = c.id
  ) as ti on true

带有TRUE / FALSE标记的Postgresql中的数据透视

2 个答案: