我使用Postgres相对较新,来自MySQL背景。我在Windows x64上使用Postgres 9.3.4。
我们正在提供多个固定长度文本文件中的数据。每行的第一个数字是1到4之间的数字,表示该行中数据的记录类型。这些行按顺序分组,以便始终首先是一行类型1,然后是其他类型的零行或多行。
data_x.txt
---------------------
1data01
2data02
4data03
4data04
1data05
1data06
3data07
要将其导入Postgres,我使用了以下SQL命令:
CREATE TABLE data_raw (
raw_data TEXT
);
COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file
ALTER TABLE data_raw
ADD COLUMN indicator integer;
UPDATE data_raw SET
indicator = CAST(substr(raw_data, 1, 1) AS integer),
raw_data = substr(raw_data, 2);
然后,我为4种记录类型中的每一种创建表格:
CREATE TABLE table_1 SELECT raw_data FROM data_raw WHERE indicator = 1;
CREATE TABLE table_2 SELECT raw_data FROM data_raw WHERE indicator = 2;
CREATE TABLE table_3 SELECT raw_data FROM data_raw WHERE indicator = 3;
CREATE TABLE table_4 SELECT raw_data FROM data_raw WHERE indicator = 4;
我需要做什么,但不确定如何为指标以1开头的每个组添加“id”列。我们将获得每周更新,因此我需要为每个批次指定初始ID 。因此,如果此批处理从id = 225开始,那么我想从示例数据中获取以下表:
table_1
id | raw_data
--------------------
225 | data01
226 | data05
227 | data06
table_2
id | raw_data
--------------------
225 | data02
table_3
id | raw_data
--------------------
227 | data07
table_4
id | raw_data
--------------------
225 | data03
225 | data04
答案 0 :(得分:1)
尝试这样的方法为每个数据组生成id:
SELECT sum(case when indicator = 1 then 1 else 0 end ) over(order by /*something to define the order*/) as id_base
from data_raw
它将为每个数据组生成id_base
。如果您需要从某个特定ID开始 - 只需将此ID添加到id_base
。
答案 1 :(得分:0)
1)添加PK,
2)为类型...
创建字典表CREATE TABLE "public"."data_raw"(
"id" Serial NOT NULL,
"id_type" Integer NOT NULL,
"raw_data" Text
);
CREATE TABLE "public"."data_raw_type"(
"id" Serial NOT NULL,
"name" Character varying(30));
答案 2 :(得分:0)
感谢您的输入。
使用Igor的建议我提出了以下解决方案:
CREATE TABLE data_raw (
raw_data TEXT
);
COPY data_raw FROM 'C:\path\data_x.txt' ...; -- Repeated for each file
ALTER TABLE data_raw
ADD COLUMN pk_id serial,
ADD COLUMN id integer,
ADD COLUMN indicator integer;
UPDATE data_raw SET
indicator = CAST(substr(raw_data, 1, 1) AS integer),
raw_data = substr(raw_data, 2);
CREATE TABLE id_base AS
SELECT
pk_id,
sum(CASE WHEN indicator = 1 THEN 1 ELSE 0 END) OVER (ORDER BY pk_id) AS rec_id
FROM data_raw;
CREATE INDEX id_base_pk ON id_base USING btree(pk_id);
UPDATE data_raw r SET
id = (SELECT rec_id FROM id_base b WHERE b.pk_id = r.pk_id);
DROP TABLE id_base;