我有一个文件schools.txt,如下所示:
Alabama Air University Alabama A&M University Alabama State University Concordia College-Selma Faulkner University Huntingdon College Jacksonville State University Judson College Miles College Oakwood College Samford University Southeastern Bible College Southern Christian University Spring Hill College Stillman College Talladega College University of North Alabama University of South Alabama University of West Alabama Alaska Alaska Bible College Alaska Pacific University Sheldon Jackson College University of Alaska - Anchorage University of Alaska - Fairbanks University of Alaska - Southeast Arizona American Indian College of the Assemblies of God Arizona State University Arizona State University East Arizona State University West DeVry University-Phoenix Embry-Riddle Aeronautical University Grand Canyon University Northcentral University Northern Arizona University
..等等,在这种情况下,阿拉巴马州,阿拉斯加州和亚利桑那州都是地点,其他一切都是大学。我想要做的是将位置加载到名为Location
的表中,将大学加载到名为University
的表中,其中Id
表的Location
是FK到University
表,如下所示:
CREATE TABLE Location (
Id SERIAL PRIMARY KEY,
Name TEXT
);
CREATE TABLE University (
Id SERIAL PRIMARY KEY,
Location INTEGER REFERENCES Location (Id) NOT NULL,
Name TEXT
);
所以我想在Postgres做的事情是这样的:
for (int i=0 until i = universities.size() i++){
//each entry in the universities vector is a tuple with the first entry being the country/state
//and the second entry being a vector of the universities as String's
Vector tuple = (Vector)universities.get(i);
//insert into location table
String state = (String)tuple.get(0);
Vector u = (Vector)tuple.get(1);
for(int j=0; until j =u.size(); j++){
//insert into university table with i as FK to location table
任何人都知道如何做到这一点?
答案 0 :(得分:1)
这是纯SQL解决方案。
使用COPY
将文件导入临时表和一个带data modifying CTEs的DML语句(需要 PostgreSQL 9.1 或更高版本)来完成剩下的工作。这两个步骤应该很快:
具有单个文本列的临时表(在会话结束时自动删除):
CREATE TEMP TABLE tmp (txt text);
从文件导入数据:
COPY tmp FROM '/path/to/file.txt'
如果您是从远程客户端执行此操作,请改用meta command \copy
of psql。
我的解决方案取决于问题中显示的数据格式。即:在城市之前和之后有一个空行。我假设导入文件中有实际的空字符串。确保在第一个城市之前有一个带有空字符串的前导行,以避免出现特殊情况。
将按顺序插入行。我将它用于以下窗口函数而不进行排序。
WITH x AS (
SELECT txt
,row_number() OVER () AS rn
,lead(txt) OVER () = '' AND
lag(txt) OVER () = '' AS city
FROM tmp -- don't remove empty rows just yet
), y AS (
SELECT txt, city
,sum(city::int) OVER w AS id
FROM x
WHERE txt <> '' -- remove empty rows now
WINDOW w AS (ORDER BY rn)
), l AS (
INSERT INTO location (id, name)
SELECT id, txt
FROM y
WHERE city
), u AS (
INSERT INTO university u (location, name)
SELECT id, txt
FROM y
WHERE NOT city
)
SELECT setval('location_id_seq', max(id))
FROM y;
VOILÀ。
CTE x
根据行前后行中的空字符串值标记城市。
CTE y
添加了一系列城市(id
),从而为每个城市及其unis形成一个完全有效的id
。
CTE l
和u
进行插入,现在很容易。
最终SELECT
设置附加到location.id
的序列的下一个值。我们一直没有使用它,所以我们必须将它设置为当前的最大值,否则我们会遇到重复的键错误,以及将来的INSERT到位。
答案 1 :(得分:1)
将原始内容转换为表格是最安全的方式...然后您可以使用COPY上传它。
BEGIN { bl=0; body=0; header=""; }
$0 == "" && body==1 && header!="" { header=""; body=0; bl=1; next; }
$0 == "" && body==0 { bl=1; next; }
$0 != "" && header=="" { header=$0; bl=0; next; }
$0 != "" && bl==1 && header!="" { body=1; print header, ",", $0 }
类似于AWK会将您的文件转换为一个表,然后您可以使用直接的psql复制语句上传该表:
COPY university_data_file_table FROM awk-mashed-file;
然后您可以将该表转换为单独的表:
CREATE TABLE country AS SELECT DISTINCT country FROM university_data_file_table;
CREATE TABLE university AS SELECT country.id, udft.university FROM country, university_data_file_table udft WHERE udft.country = country.country;
这样的东西很容易用psql脚本编写脚本。正如我所说,你必须做初始转换。