将数据从flatfiles加载到MySQL数据库,然后通过外键创建表间关系的最快方法是什么?
例如......我有一个格式为平面的文件:
[INDIVIDUAL] [POP] [MARKER] [GENOTYPE]
"INDIVIDUAL1", "CEU", "rs55555","AA"
"INDIVIDUAL1", "CEU", "rs535454","GA"
"INDIVIDUAL1", "CEU", "rs555566","AT"
"INDIVIDUAL1", "CEU", "rs12345","TT"
...
"INDIVIDUAL2", "JPT", "rs55555","AT"
我需要将其加载到四个表中:
IND (id,fk_pop,name)
POP (id,population)
MARKER (id,rsid)
GENOTYPE (id,fk_ind,fk_rsid,call)
具体来说,如何以缩放的方式填充外键?这些数字在1000多个人的范围内,每个人有100万+基因型。
答案 0 :(得分:9)
我会采取多步骤的方法来做到这一点。
答案 1 :(得分:4)
有一种更简单的方法。
首先,确保对那些应该有一个(name,population,rsid)的列有一个UNIQUE约束。
然后使用以下内容:
LOAD DATA INFILE 'data.txt' IGNORE INTO TABLE POP FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES
(@name, population, @rsid, @call);
LOAD DATA INFILE 'data.txt' IGNORE INTO TABLE MARKER FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES
(@name, @population, rsid, @call);
LOAD DATA INFILE 'data.txt' IGNORE INTO TABLE IND FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES
(name, @population, @rsid, @call)
SET fk_pop = (SELECT id FROM POP WHERE population = @population);
LOAD DATA INFILE 'data.txt' IGNORE INTO TABLE GENOTYPE FIELDS TERMINATED BY ','
ENCLOSED BY '"' LINES TERMINATED BY '\n' IGNORE 1 LINES
(@name, @population, @rsid, call)
SET fk_ind = (SELECT id FROM IND where name = @name),
fk_rsid = (SELECT id FROM MARKER where rsid = @rsid);
注意@用于表示变量,而不是列名。在前2个LOAD DATA中,这些仅用于忽略数据。在第二个2中,它们用于查找外键。
可能不会很快,请注意:)。
答案 2 :(得分:0)
您可以从没有外键的基表开始。然后,当您在其他表中插入数据时,您将查找ID。
另一个想法是你可以用GUID替换平面文件(INDIVIDUAL1,CEU,...等)中的ID。然后直接将它们用作ID和外键(我注意到这是标记的性能,这可能无法提供最佳的“性能”)。