Question

我订购的数据库如下：

    ID     |    Subject     |      Value
 ---------------------------------------------
     1            Subj1            Val1
     1            Subj2            Val2
     2            Subj1            Val3
     2            Subj5            Val4

等等。身份证号码成千上万，受试者数以万计。

我想找到这个矩阵的转置。我怎么做？

如果主题是一个小的静态集合，那么我可以像在这个解决方案中使用CASE语句：SQLITE - transposing rows into columns properly，但在我的情况下，主题是一个大的动态集合，所以CASE将无法工作，除非我正在数据库外的某些应用程序中动态构建SQL。就我而言，没有“申请”;正如我在下面的评论中提到的，我正在寻找一种纯SQL解决方案。

这是踢球者：我正在使用SQLite，并且它缺少使这更容易的PIVOT语句。我不知道是否仍然有办法做到这一点，并且没有教过很多类型的连接。对于较小的数据库，我会尝试一系列左连接，但由于我的结果中有很多列值，我不知道如何。

如何转换为此表单？

      ID      |     Subj1       |        Subj2      |    Subj3     |       etc.
   --------------------------------------------------------------------
      1              Val1                  Val2            0        
      2              Val3                   0              0

Answer 1

您可以使用条件聚合来转动您的数据：

select id,
    max(case when Subject = 'Subj1' then Value end) as Subj1,
    max(case when Subject = 'Subj2' then Value end) as Subj2,
    max(case when Subject = 'Subj3' then Value end) as Subj3,
    . . . 
from your_table
group by id;

请注意，如果有多个行具有相同的ID且具有相同的Subject，则只会返回一行具有最大值的行。

Answer 2

SQLite（至少从 3.31 开始）开箱即用没有 PIVOT、CROSSTAB 或任何其他类似功能。更一般地，它没有动态生成列的方法（并且手段是手动创建 CASE 表达式来定义列）。但是有 pivot_vtab 虚拟表扩展实现了数据透视表功能。我想这是一种概念验证实现，但根据情况仍然有用。

以下是来自 repo 的详细描述（已编辑以适合片段）：

CREATE VIRTUAL TABLE pivot USING pivot_vtab(
  --
  -- Pivot table row key query. Defines first column of the pivot table.
  -- [...]
  -- The first column name in this query will become the name of the pivot table key 
  -- column. The value of the [...] key column is provided to the pivot query as ?1.
  --
 (SELECT id r_id -- pivot table key
    FROM r), 
  --
  -- Pivot table column definition query. Defines second+ column(s) of the pivot
  -- table. This query should return pivot table column key/name pairs.
  -- [...]
  -- The first column of this query is the pivot column key, and is provided
  -- to the pivot query as ?2. The second column of this query is used to name the  
  -- pivot table columns. This column is required to return unique values.
  -- 
  -- Changes to this query can only be propagated by dropping and 
  -- re-creating the virtual table
  --
 (SELECT id c_id,   -- pivot column key - can be referenced in pivot query as ?2
         name       -- pivot column name
    FROM c),    
  --
  -- Pivot query. This query should define a single value in the pivot table when
  -- filtered by the pivot table row key (1?) and a column key (2?)
  --
 (SELECT val FROM x WHERE r_id = ?1 AND c_id = ?2)
);

在我玩它时，我遇到了几个问题：

如果 CREATE VIRTUAL TABLE ... 无效，它可能会因分段错误而崩溃（例如在丢失的桌子上）
您从中得到的唯一错误消息是 vtable constructor failed: to_be_table
它需要具有唯一枢轴列 id-name 对的单独表（字符大小写有所不同），否则不会创建枢轴

我会用这个 GitHub Gist with a CSV with salary data。

这里我准备了数据。将 CSV 导入 :memory: SQLite3 数据库，加载扩展并清理数据（我将 FAANG 软件工程师的记录加载到临时表中）。

.mode csv 
.import salaries.csv salary_import

.load ./pivot_vtab
.headers on
.mode column

CREATE TABLE temp.salary AS
WITH clean AS (
  SELECT 
    Employer employer, 
    lower(trim("Job Title")) title, 
    replace("Annual Base Pay", ',', '') base_pay
  FROM salary_import
)
SELECT employer, title, round(avg(base_pay)) avg_base_pay, COUNT(*) count
FROM clean
WHERE
  employer IN ('Facebook', 'Amazon', 'Apple', 'Netflix', 'Google')
  AND title LIKE '%software engineer%'
GROUP BY 1, 2;

这里是实际的数据透视创建（在我的示例中它是一个临时表，但它也可以是持久的）。

CREATE VIRTUAL TABLE temp.pivot USING pivot_vtab(
  (SELECT employer FROM temp.salary GROUP BY employer),
  (SELECT title, title FROM temp.salary GROUP BY title),   
  (
    SELECT avg_base_pay
    FROM temp.salary
    WHERE employer = ?1 AND title = ?2
  )
);
SELECT * FROM temp.pivot;

要运行它，请将这些文件保存在一个目录中：

salary.sql（上面的两个片段合并）
salaries.csv（来自 Gist 的 CSV）
pivot_vtab.c（扩展代码）

然后像这样运行它（chmod o+w . 如果你运行用户命名空间的 Docker）：

$ docker run --rm -it -v $PWD:/tmp/build -w /tmp/build ubuntu:focal
# apt update
# apt install -y --no-install-recommends gcc sqlite3 libsqlite3-dev
# sqlite3 --version
3.31.1 2020-01-27 19:55:54 3bfa9cc...
# gcc -g -O3 -fPIC -shared pivot_vtab.c -o pivot_vtab.so
# cat salary.sql | sqlite3 --bail
employer    data engineer/software engineer  senior software engineer  software eng...
----------  -------------------------------  ------------------------  ------------...
Amazon                                                                 130000.0    ...
Apple       145000.0                         155500.0                  122667.0    ...
Facebook                                     182000.0                  166690.0    ...
Google                                       158267.0                  131465.0    ...
Netflix                                      340000.0                              ...

SQLITE将大量行转换为列

2 个答案: