我正在尝试针对以下情况提出合理的SQLite数据库架构。
我有一个包含出版物的数据库。这些出版物中的每一个都有许多作者,这些作者中的每一个都可以属于许多不同的研究所。然而,有可能多个作者共享一个研究所。例如,出版物有作者Anne,Bert和Carl。安妮属于A和B机构,Bert属于B和C,Carl属于C,D和E.
每个出版物的作者数量应该是可变的,任何特定作者可以属于的研究所数量也应该是可变的。不过,研究所只应代表一次。
现在,我正在考虑以下架构:
create table publications (
id integer primary key autoincrement,
...
)
create table publication_institutes (
id integer primary key autoincrement,
publication_id integer references publications(id),
...
)
create table publication_authors (
id integer primary key autoincrement,
publication_id integer references publications(id),
...
)
create table publication_author_institute (
id integer primary key autoincrement,
institude_id integer references publication_institutes(id),
author_id integer references publication_authors(id)
)
这似乎不是最理想的。 publication_author_institute
已经不再引用publications
中的ID了,但明确地将它放在那里似乎也没有。
这种情况有明确的解决方案吗?
提前致谢!
答案 0 :(得分:0)
I'd go with a schema that has Authors, Publications, and Institutes, and then many-to-many tables for the relationships of Authors-to-Institutes and Authors-to-Publications.
For example:
author
id name other info (lastname, title, etc)
1 Anne ...
2 Bert ...
3 Carl ...
institute
id name other info (address, city, state, etc)
1 A ...
2 B ...
3 C ...
4 D ...
5 E ...
publication
id title journal year issue
1 How to Publish Weekly Journal 2011 5
2 What I Do Monthly Mag 2013 1
3 How Are We Ranger Rick 2014 9
author_affiliation
id author_id inst_id
1 1 1
2 1 2
3 2 2
4 2 3
5 3 3
6 3 4
7 3 5
(for this table, could also use author_id-inst_id as a composite PK,
and get rid of the id column, as in the next table)
publication_author (here i'm using the composite PK)
pub_id auth_id position
1 1 1
1 2 2
2 1 2
2 3 1
3 3 1
(position indicates order of authorship, which is important)
In the long run, this makes maintenance of the database easier, as a change in affiliation just requires dropping a record from the author_affilitation table, without needing to make any changes to institute or author entries. For publications, you will have to enter a record both for publication and publication_author, but it avoids having to deal with columns with multiple values (like an 'authors' column in publication table).
Likewise, if you want to query for all publication by an author, you can query publication_authors with a join on publication and author tables.
It's the schema I'd use for reducing redundancy and giving flexibility in querying later on.