将文件中的矩阵加载到PostgreSQL表中

时间:2012-05-02 04:03:37

标签: sql postgresql matrix file-io postgresql-copy

我有一个文件schools.txt,如下所示:

Alabama

Air University
Alabama A&M University
Alabama State University
Concordia College-Selma
Faulkner University
Huntingdon College
Jacksonville State University
Judson College
Miles College
Oakwood College
Samford University
Southeastern Bible College
Southern Christian University
Spring Hill College
Stillman College
Talladega College
University of North Alabama
University of South Alabama
University of West Alabama

Alaska

Alaska Bible College
Alaska Pacific University
Sheldon Jackson College
University of Alaska - Anchorage
University of Alaska - Fairbanks
University of Alaska - Southeast

Arizona

American Indian College of the Assemblies of God
Arizona State University
Arizona State University East
Arizona State University West
DeVry University-Phoenix
Embry-Riddle Aeronautical University
Grand Canyon University
Northcentral University
Northern Arizona University

..等等,在这种情况下,阿拉巴马州,阿拉斯加州和亚利桑那州都是地点,其他一切都是大学。我想要做的是将位置加载到名为Location的表中,将大学加载到名为University的表中,其中Id表的Location是FK到University表,如下所示:

CREATE TABLE Location (
Id          SERIAL PRIMARY KEY,
Name        TEXT
);

CREATE TABLE University (
Id          SERIAL PRIMARY KEY,
Location    INTEGER REFERENCES Location (Id) NOT NULL,
Name        TEXT
);

所以我想在Postgres做的事情是这样的:

for (int i=0 until i = universities.size()  i++){
//each entry in the universities vector is a tuple with the first entry being the country/state
//and the second entry being a vector of the universities as String's
Vector tuple = (Vector)universities.get(i);
//insert into location table
String state = (String)tuple.get(0); 
Vector u = (Vector)tuple.get(1);
for(int j=0; until j =u.size(); j++){
//insert into university table with i as FK to location table

任何人都知道如何做到这一点?

2 个答案:

答案 0 :(得分:1)

这是纯SQL解决方案

使用COPY将文件导入临时表和一个带data modifying CTEs的DML语句(需要 PostgreSQL 9.1 或更高版本)来完成剩下的工作。这两个步骤应该很快:

具有单个文本列的临时表(在会话结束时自动删除):

CREATE TEMP TABLE tmp (txt text);

从文件导入数据:

COPY tmp FROM '/path/to/file.txt'

如果您是从远程客户端执行此操作,请改用meta command \copy of psql

我的解决方案取决于问题中显示的数据格式。即:在城市之前和之后有一个空行。我假设导入文件中有实际的空字符串。确保在第一个城市之前有一个带有空字符串的前导行,以避免出现特殊情况。

将按顺序插入行。我将它用于以下窗口函数而不进行排序。

WITH x AS (
    SELECT txt
          ,row_number() OVER () AS rn
          ,lead(txt) OVER () = '' AND
            lag(txt) OVER () = '' AS city
    FROM   tmp                -- don't remove empty rows just yet
    ), y AS (
    SELECT txt, city
          ,sum(city::int) OVER w AS id
    FROM   x
    WHERE  txt <> ''          -- remove empty rows now
    WINDOW w AS (ORDER BY rn)
    ), l AS (
    INSERT INTO location (id, name)
    SELECT id, txt
    FROM   y
    WHERE  city
    ), u AS (
    INSERT INTO university u (location, name)
    SELECT id, txt
    FROM   y
    WHERE  NOT city
    )
SELECT setval('location_id_seq', max(id))
FROM   y;

VOILÀ。

  • CTE x根据行前后行中的空字符串值标记城市。

  • CTE y添加了一系列城市(id),从而为每个城市及其unis形成一个完全有效的id

  • CTE lu进行插入,现在很容易。

  • 最终SELECT设置附加到location.id的序列的下一个值。我们一直没有使用它,所以我们必须将它设置为当前的最大值,否则我们会遇到重复的键错误,以及将来的INSERT到位。

答案 1 :(得分:1)

将原始内容转换为表格是最安全的方式...然后您可以使用COPY上传它。

BEGIN { bl=0; body=0; header=""; } 
$0 == "" && body==1 && header!="" { header=""; body=0; bl=1; next; }
$0 == "" && body==0 { bl=1; next; }
$0 != "" && header=="" { header=$0; bl=0; next; }
$0 != "" && bl==1 && header!="" { body=1; print header, ",", $0 }

类似于AWK会将您的文件转换为一个表,然后您可以使用直接的psql复制语句上传该表:

COPY university_data_file_table FROM awk-mashed-file;

然后您可以将该表转换为单独的表:

CREATE TABLE country AS SELECT DISTINCT country FROM university_data_file_table;
CREATE TABLE university AS SELECT country.id, udft.university FROM country, university_data_file_table udft WHERE udft.country = country.country;

这样的东西很容易用psql脚本编写脚本。正如我所说,你必须做初始转换。