与Peewee和Marshmallow序列化多对多关系

时间:2018-01-25 22:01:26

标签: python postgresql peewee marshmallow

我有一个PostgreSQL数据库,其中有多对多用户标记与下表的关系:

  • social_user:用户信息
  • 标记:标记信息
  • user_tag: social_user 标记之间的多对多关系

我正在尝试使用Flask,Peewee和Marshmallow构建一个简单的API来访问此数据库中的数据。我们现在可以忽略Flask,但我正在尝试为 social_user 创建一个模式,允许我在一个返回一个或多个用户及其各自标签的查询中进行转储。我正在寻找类似于以下内容的响应:

{
    "id": "[ID]",
    "handle": "[HANDLE]",
    "local_id": "[LOCAL_ID]",
    "platform_slug": "[PLATFORM_SLUG]",
    "tags": [
        {
            "id": "[ID]",
            "title": "[TITLE]",
            "tag_type": "[TAG_TYPE]"
        },
        {
            "id": "[ID]",
            "title": "[TITLE]",
            "tag_type": "[TAG_TYPE]"
        }
    ]
}

我设法通过在@post_dump包装函数中包含第二个查询来提取 social_user 模式中的标记,但是,这感觉就像一个黑客,似乎也会对于大量用户来说速度很慢(更新:这非常慢,我在369位用户上测试过)。我想我可以用Marshmallow的fields.Nested field type做些什么。有没有更好的方法来只用一个Peewee查询序列化这种关系?我的代码如下:

# just so you are aware of my namespaces
import marshmallow as marsh
import peewee as pw

Peewee模型

db = postgres_ext.PostgresqlExtDatabase(
    register_hstore = False,
    **json.load(open('postgres.json'))
)

class Base_Model(pw.Model):
    class Meta:
        database = db

class Tag(Base_Model):
    title = pw.CharField()
    tag_type = pw.CharField(db_column = 'type')

    class Meta:
        db_table = 'tag'

class Social_User(Base_Model):
    handle = pw.CharField(null = True)
    local_id = pw.CharField()
    platform_slug = pw.CharField()

    class Meta:
        db_table = 'social_user'

class User_Tag(Base_Model):
    social_user_id = pw.ForeignKeyField(Social_User)
    tag_id = pw.ForeignKeyField(Tag)

    class Meta:
        primary_key = pw.CompositeKey('social_user_id', 'tag_id')
        db_table = 'user_tag'

Marshmallow Schemas

class Tag_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    title = marsh.fields.Str(required = True)
    tag_type = marsh.fields.Str(required = True, default = 'descriptive')

class Social_User_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    local_id = marsh.fields.Str(required = True)
    handle = marsh.fields.Str()
    platform_slug = marsh.fields.Str(required = True)
    tags = marsh.fields.Nested(Tag_Schema, many = True, dump_only = True)

    def _get_tags(self, user_id):
        query = Tag.select().join(User_Tag).where(User_Tag.social_user_id == user_id)
        tags, errors = tags_schema.dump(query)
        return tags

    @marsh.post_dump(pass_many = True)
    def post_dump(self, data, many):
        if many:
            for datum in data:
                datum['tags'] = self._get_tags(datum['id']) if datum['id'] else []
        else:
            data['tags'] = self._get_tags(data['id'])
        return data

user_schema = Social_User_Schema()
users_schema = Social_User_Schema(many = True)
tags_schema = Tag_Schema(many = True)

以下是一些演示功能的测试:

db.connect()
query = Social_User.get(Social_User.id == 825)
result, errors = user_schema.dump(query)
db.close()
pprint(result)
{'handle': 'test',
 'id': 825,
 'local_id': 'test',
 'platform_slug': 'tw',
 'tags': [{'id': 20, 'tag_type': 'descriptive', 'title': 'this'},
          {'id': 21, 'tag_type': 'descriptive', 'title': 'that'}]}
db.connect()
query = Social_User.select().where(Social_User.platform_slug == 'tw')
result, errors = users_schema.dump(query)
db.close()
pprint(result)
[{'handle': 'test',
  'id': 825,
  'local_id': 'test',
  'platform_slug': 'tw',
  'tags': [{'id': 20, 'tag_type': 'descriptive', 'title': 'this'},
           {'id': 21, 'tag_type': 'descriptive', 'title': 'that'}]},
 {'handle': 'test2',
  'id': 826,
  'local_id': 'test2',
  'platform_slug': 'tw',
  'tags': []}]

1 个答案:

答案 0 :(得分:1)

看起来这可以使用Peewee模型中的ManyToMany field并手动设置through_model来完成。 ManyToMany字段允许您向模型添加一个字段,将两个表相互关联,通常它会自动创建关系表(through_model)本身,但您可以手动设置它。

我正在使用3.0 alpha of Peewee,但我确信很多人都在使用当前的稳定版本,因此我将包含这两个版本。我们将在Peewee 2.x中使用DeferredThroughModel对象和ManyToMany字段,它们位于3.x中的“剧场”中,它们是Peewee主要版本的一部分。我们还将删除@post_dump包装函数:

Peewee模型

# Peewee 2.x
# from playhouse import fields
# User_Tag_Proxy = fields.DeferredThroughModel()

# Peewee 3.x
User_Tag_Proxy = pw.DeferredThroughModel()

class Tag(Base_Model):
    title = pw.CharField()
    tag_type = pw.CharField(db_column = 'type')

    class Meta:
        db_table = 'tag'

class Social_User(Base_Model):
    handle = pw.CharField(null = True)
    local_id = pw.CharField()
    platform_slug = pw.CharField()
    # Peewee 2.x
    # tags = fields.ManyToManyField(Tag, related_name = 'users', through_model = User_Tag_Proxy)

    # Peewee 3.x
    tags = pw.ManyToManyField(Tag, backref = 'users', through_model = User_Tag_Proxy)

    class Meta:
        db_table = 'social_user'

class User_Tag(Base_Model):
    social_user = pw.ForeignKeyField(Social_User, db_column = 'social_user_id')
    tag = pw.ForeignKeyField(Tag, db_column = 'tag_id')

    class Meta:
        primary_key = pw.CompositeKey('social_user', 'tag')
        db_table = 'user_tag'

User_Tag_Proxy.set_model(User_Tag)

Marshmallow Schemas

class Social_User_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    local_id = marsh.fields.Str(required = True)
    handle = marsh.fields.Str()
    platform_slug = marsh.fields.Str(required = True)
    tags = marsh.fields.Nested(Tag_Schema, many = True, dump_only = True)

user_schema = Social_User_Schema()
users_schema = Social_User_Schema(many = True)

在实践中,它与使用@post_dump包装函数完全相同。不幸的是,虽然这似乎是解决这个问题的“正确”方法,但它实际上稍慢了。

- UPDATE -

我成功地在1/100的时间内完成了同样的事情。这有点像黑客,可以使用一些清理,但它的工作原理!我没有对模型进行更改,而是调整了收集和处理数据的方式,然后再将数据传递给模式进行序列化。

Peewee模型

class Tag(Base_Model):
    title = pw.CharField()
    tag_type = pw.CharField(db_column = 'type')

    class Meta:
        db_table = 'tag'

class Social_User(Base_Model):
    handle = pw.CharField(null = True)
    local_id = pw.CharField()
    platform_slug = pw.CharField()

    class Meta:
        db_table = 'social_user'

class User_Tag(Base_Model):
    social_user = pw.ForeignKeyField(Social_User, db_column = 'social_user_id')
    tag = pw.ForeignKeyField(Tag, db_column = 'tag_id')

    class Meta:
        primary_key = pw.CompositeKey('social_user', 'tag')
        db_table = 'user_tag'

Marshmallow Schema

class Social_User_Schema(marsh.Schema):
    id = marsh.fields.Int(dump_only = True)
    local_id = marsh.fields.Str(required = True)
    handle = marsh.fields.Str()
    platform_slug = marsh.fields.Str(required = True)
    tags = marsh.fields.Nested(Tag_Schema, many = True, dump_only = True)

user_schema = Social_User_Schema()
users_schema = Social_User_Schema(many = True)

查询

对于新查询,我们将加入(LEFT_OUTER)三个表( Social_User 标记 User_Tag )以 Social_User 作为我们的真相来源。我们希望确保每个用户都能获得标签。这将根据用户拥有的标签数量多次返回用户,因此我们需要通过迭代每个用户并使用字典来存储对象来减少这种情况。在这些新Social_User个对象中,我们将添加tags列表,我们将附加Tag个对象。

db.connect()
query = (Social_User.select(User_Tag, Social_User, Tag)
    .join(User_Tag, pw.JOIN.LEFT_OUTER)
    .join(Tag, pw.JOIN.LEFT_OUTER)
    .order_by(Social_User.id))

users = {}
last = None
for result in query:
    user_id = result.id
    if (user_id not in users):
        # creates a new Social_User object matching the user data
        users[user_id] = Social_User(**result.__data__)
        users[user_id].tags = []
    try:
        # extracts the associated tag
        users[user_id].tags.append(result.user_tag.tag)
    except AttributeError:
        pass

result, errors = users_schema.dump(users.values())
db.close()
pprint(result)