从复杂的数据库提取(SQL)构建树状哈希(YAML)?

时间:2015-06-06 14:22:14

标签: perl hash

简介

考虑到本问题末尾给出的表格,我想要一个算法或一个简单的解决方案,它返回YAML描述中的嵌套树。使用格式是可选的需求。实际上,我需要的输出是一个有序哈希数组,可能包含也可能不包含嵌套的有序哈希或有序哈希数组。

简而言之,我所说的是树状的结构。

为了更好地理解我的问题,我会处理一个涵盖我所有需求的简单例子。实际上这个例子是我用来实现这个算法的例子。

由于我在Perl中的知识有限,我决定与我自己的调查同时提出这个问题。我不想挖掘错误的隧道,这就是我寻求帮助的原因。

我目前专注于DBI模块。我试着查看其他模块,例如DBIx::Tree::NestedSet,但我不认为这是我需要的。

所以,让我们来看看我的例子的细节。

实施例

最初的想法是编写一个程序,该程序采用描述并输出提取的数据。

此输入说明遵循简单的规则:

  • 查询是我们要查找的数据。它可以包含以下键
    • sql 是SQL查询
    • 隐藏隐藏最终输出中的列。当仅在子查询中需要列但最终不需要列时使用此字段。
    • 子查询是为父查询的每一行执行的嵌套查询
    • 绑定以将列值绑定到查询
    • hash 告诉程序将结果分组,而不是散列数组,而是散列哈希值。实际上,这可以直接提交给DBI::selectall_hashref。如果省略此字段,则输出将列为有序哈希数组。
    • 是在父级结果的同一级别列出的键的名称。我们会看到 之后,密钥名称可以掩盖结果列。
    • list 告诉程序将结果列入数组。请注意,只能显示一列 array: name显示names
    • 列表
  • connect 是DBI连接字符串
  • 格式是输出格式。它可以是XMLYAMLJSON。我主要关注的是 YAML格式,因为它可以轻松翻译。省略时,默认输出为YAML
  • 缩进多少个空格是一个标识。还支持tabstab值。

另外,我们知道在Perl哈希中没有订购。这里,输出键顺序很重要,应该出现在sql查询中。

从此我简单地使用YAML模块:(

总之,最后我们将执行此命令:

$ cat desc.yml | ./fetch > data.yml

desc.yml说明如下:

---
connect: "dbi:SQLite:dbname=einstein-puzzle.sqlite"
ident: 4
query:
   - sql: SELECT * from people
     hide:
        - pet_id
        - house_id
        - id
     subquery:
        - key: brevage
          bind: id
          sql: |
                SELECT name, calories, potassium FROM drink
                LEFT JOIN people_has_drink ON drink.id = people_has_drink.id_drink
                WHERE people_has_drink.id_people = 1       
          hash:
             - name
        - key: house
          sql: SELECT color as paint, size, id from house WHERE id = ?
          hide: id
          bind: paint
          subquery:
             - key: color
               sql: SELECT name, ral, hex from color WHERE short LIKE ?
               bind: color
        - key: pet
          sql: SELECT name from pet WHERE id = ?
          bind: pet_id
          list: name

预期产出

从上面的描述中,输出数据将是:

---
- nationality: Norvegian
  smoke: Dunhill
  brevage:
      orange juice:
          calories: 45
          potassium: 200 mg
      water:
          calories: 0
          potassium: 3 mg
  house:
      color:
          name: Zinc yellow
          ral:  RAL 1018
          hex:  #F8F32B
      paint: yellow
      size: small
  pet:
      - cats
- nationality: Brit
  smoke: Pall Mall
  brevage:
      milk:
          calories: 42
          potassium: 150 mg
  house:
      color:
          name: Vermilion
          ral:  RAL 2002
          hex:  #CB2821
      paint: red
      size: big
  pet:
      - birds
      - phasmatodea

我在哪里

我仍然没有完全实现嵌套查询。我现在的状态如下:

#!/usr/bin/env perl
use 5.010;
use strict;
use warnings;
use DBI;
use YAML;
use Data::Dumper;
use Tie::IxHash;

# Read configuration and databse connection
my %yml = %{ Load(do { local $/; <DATA>}) };
my $dbh = DBI->connect($yml{connect});

# Fill the bind values of the first query with command-line information
my %bind;
for(@ARGV) {
    next unless /--(\w+)=(.*)/;
    $bind{$1} = $2;
}

my $q0 = $yml{query}[0];
if ($q0->{bind} and keys %bind > 0) {
    $q0->{bind_values} = arrayref($q0->{bind});
    $q0->{bind_values}[$_] = $bind{$q0->{bind}[$_]} foreach (0 .. @{$q0->{bind}} - 1);
}

# Fetch all data from the database recursively
my $out = fetch($q0);

sub fetch {
    # As long we have a query, one processes it
    my $query = shift; 
    return undef unless $query;

    $query->{bind_values} = [] unless ref $query->{bind_values} eq 'ARRAY';
    # Execute SQL query
    my $sth = $dbh->prepare($query->{sql});
    $sth->execute(@{$query->{bind_values}});
    my @columns = @{$sth->{NAME}}; 

    # Fetch all the current level's data and preserve columns order 
    my @return;
    for my $row (@{$sth->fetchall_arrayref()}) { 
        my %data;
        tie %data, 'Tie::IxHash';
        $data{$columns[$_]} = $row->[$_] for (0 .. $#columns);
        for my $subquery (@{ $query->{subquery} }) { 
            my @bind;
            push @bind, $data{$_} for (@{ arrayref($subquery->{bind}) });
            $subquery->{bind_values} = \@bind;
            my $sub = fetch($subquery);

            # Present output as a list 
            if ($subquery->{list}) {
                #if ( map ( $query->{list} eq $_ , keys $sub ) )
                my @list;
                for (@$sub) {
                    push @list, $_->{$subquery->{list}};
                }
                $sub = \@list;
            }

            if ($subquery->{key}) {
                $data{$subquery->{key}} = $sub;
            } else {
                die "[Error] Key is missing for query '$subquery->{sql}'";
            }
        }

        # Remove unwanted columns from the output
        if ($query->{hide}) {
            delete $data{$_} for( @{ arrayref($query->{hide}) } );
        }        

        push @return, \%data;
    }

    \@return;    
}

DumpYaml($out);

sub arrayref {
   my $ref = shift;
   return (ref $ref ne 'ARRAY') ? [$ref] : $ref; 
}

sub DumpYaml {
    # I am not happy with this current dumper. I cannot specify the indent and it does 
    # not preserve the extraction order
    print Dump shift;
}

__DATA__
---
connect: "dbi:SQLite:dbname=einstein-puzzle.sqlite"
ident: 4
query:
   - sql: SELECT * from people
     hide:
        - pet_id
        - house_id
        - id
     subquery:
        - key: brevage
          bind: id
          sql: |
                SELECT name, calories, potassium FROM drink
                LEFT JOIN people_has_drink ON drink.id = people_has_drink.id_drink
                WHERE people_has_drink.id_people = ?       
          hash:
             - name
        - key: house
          sql: SELECT color as paint, size, id from house WHERE id = ?
          hide: id
          bind: house_id
          subquery:
             - key: color
               sql: SELECT short, ral, hex from color WHERE short LIKE ?
               bind: paint
        - key: pet
          sql: SELECT name from pet WHERE id = ?
          bind: pet_id
          list: name  

这就是我得到的输出:

---
- brevage:
    - calories: 0
      name: water
      potassium: 3 mg
    - calories: 45
      name: orange juice
      potassium: 200 mg
  house:
    - color:
        - hex: '#F8F32B'
          ral: RAL 1018
          short: yellow
      paint: yellow
      size: small
  nationality: Norvegian
  pet:
    - cats
  smoke: Dunhill
- brevage:
    - calories: 42
      name: milk
      potassium: 150 mg
  house:
    - color:
        - hex: '#CB2821'
          ral: RAL 2002
          short: red
      paint: red
      size: big
  nationality: Brit
  pet:
    - birds
    - phasmatodea
  smoke: Pall Mall

数据库

我的测试数据库是 db,其中的表格如下所示:

表人
.----+-------------+----------+--------+-----------.
| id | nationality | house_id | pet_id | smoke     |
+----+-------------+----------+--------+-----------+
|  1 | Norvegian   |        4 |      3 | Dunhill   |
|  2 | Brit        |        1 |      2 | Pall Mall |
'----+-------------+----------+--------+-----------'
表饮料
.----+--------------+----------+-----------.
| id | name         | calories | potassium |
+----+--------------+----------+-----------+
|  1 | tea          |        1 | 18 mg     |
|  2 | coffee       |        0 | 49 mg     |
|  3 | milk         |       42 | 150 mg    |
|  4 | beer         |       43 | 27 mg     |
|  5 | water        |        0 | 3 mg      |
|  6 | orange juice |       45 | 200 mg    |
'----+--------------+----------+-----------'
表人有饮料
.-----------+----------.
| id_people | id_drink |
+-----------+----------+
|         1 |        5 |
|         1 |        6 |
|         2 |        3 |
'-----------+----------'
桌屋
+----+--------+--------+
| id | color  |  size  |
+----+--------+--------+
|  1 | red    | big    |
|  2 | green  | small  |
|  3 | white  | middle |
|  4 | yellow | small  |
|  5 | blue   | huge   |
+----+--------+--------+
表颜色
.--------+-------------+----------+---------.
| short  |    color    |   ral    |   hex   |
+--------+-------------+----------+---------+
| red    | Vermilion   | RAL 2002 | #CB2821 |
| green  | Pale green  | RAL 6021 | #89AC76 |
| white  | Light grey  | RAL 7035 | #D7D7D7 |
| yellow | Zinc yellow | RAL 1018 | #F8F32B |
| blue   | Capri blue  | RAL 5019 | #1B5583 |
'--------+-------------+----------+---------'
表宠物
+----+-------------+
| id |    name     |
+----+-------------+
|  1 | dogs        |
|  2 | birds       |
|  3 | cats        |
|  4 | horses      |
|  5 | fishes      |
|  2 | phasmatodea |
+----+-------------+

数据库数据

如果您希望使用与我相同的数据,还可以满足您的所有需求:

BEGIN TRANSACTION;
CREATE TABLE "pet" (
    `id`    INTEGER,
    `name`  TEXT
);
INSERT INTO `pet` VALUES (1,'dogs');
INSERT INTO `pet` VALUES (2,'birds');
INSERT INTO `pet` VALUES (3,'cats');
INSERT INTO `pet` VALUES (4,'horses');
INSERT INTO `pet` VALUES (5,'fishes');
INSERT INTO `pet` VALUES (2,'phasmatodea');
CREATE TABLE `people_has_drink` (
    `id_people` INTEGER NOT NULL,
    `id_drink`  INTEGER NOT NULL,
    PRIMARY KEY(id_people,id_drink)
);
INSERT INTO `people_has_drink` VALUES (1,5);
INSERT INTO `people_has_drink` VALUES (1,6);
INSERT INTO `people_has_drink` VALUES (2,3);
CREATE TABLE "people" (
    `id`    INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    `nationality`   VARCHAR(45),
    `house_id`  INT,
    `pet_id`    INT,
    `smoke` VARCHAR(45)
);
INSERT INTO `people` VALUES (1,'Norvegian',4,3,'Dunhill');
INSERT INTO `people` VALUES (2,'Brit',1,2,'Pall Mall');
CREATE TABLE "house" (
    `id`    INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    `color` TEXT,
    `size`  TEXT
);
INSERT INTO `house` VALUES (1,'red','big');
INSERT INTO `house` VALUES (2,'green','small');
INSERT INTO `house` VALUES (3,'white','middle');
INSERT INTO `house` VALUES (4,'yellow','small');
INSERT INTO `house` VALUES (5,'blue','huge');
CREATE TABLE `drink` (
    `id`    INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
    `name`  TEXT,
    `calories`  INTEGER,
    `potassium` TEXT
);
INSERT INTO `drink` VALUES (1,'tea',1,'18 mg');
INSERT INTO `drink` VALUES (2,'coffee',0,'49 mg');
INSERT INTO `drink` VALUES (3,'milk',42,'150 mg');
INSERT INTO `drink` VALUES (4,'beer',43,'27 mg');
INSERT INTO `drink` VALUES (5,'water',0,'3 mg');
INSERT INTO `drink` VALUES (6,'orange juice',45,'200 mg');
CREATE TABLE `color` (
    `short` TEXT UNIQUE,
    `color` TEXT,
    `ral`   TEXT,
    `hex`   TEXT,
    PRIMARY KEY(short)
);
INSERT INTO `color` VALUES ('red','Vermilion','RAL 2002','#CB2821');
INSERT INTO `color` VALUES ('green','Pale green','RAL 6021','#89AC76');
INSERT INTO `color` VALUES ('white','Light grey','RAL 7035','#D7D7D7');
INSERT INTO `color` VALUES ('yellow','Zinc yellow','RAL 1018','#F8F32B');
INSERT INTO `color` VALUES ('blue','Capri blue','RAL 5019','#1B5583');
COMMIT;

1 个答案:

答案 0 :(得分:1)

  

我的实施是否良好

这是一个相当广泛的问题,答案可能取决于您对代码的要求。例如:

有用吗?它是否具备您需要的所有功能?它能做你想要的吗?它是否适合您想要满足的所有输入范围(并输入您不会)?如果您不确定,请写一些tests

速度够快吗?如果不是,慢点是什么?使用Devel::NYTProf查找它们。

如果它正常工作,您可能还想将代码转换为module而不仅仅是脚本,以便您可以再次使用它。

  

如果没有(我假设我做错了),我应该使用哪些模块来获得所需的行为?

这听起来非常像你DBIx::Class(又称DBIC)在你问prefetch时会做的事情。它将为您构建对象的数据结构。

如果你需要动态地响应任意数据库和YAML,那不是DBIC的设计目的;它可能是可能的,但可能会让你动态创建包,这并不容易。