将具有dict属性的JSON写入Google Cloud Datastore

时间:2018-11-09 22:57:25

标签: python-2.7 google-cloud-datastore google-cloud-dataflow

使用Apache Beam(Python 2.7 SDK)我正在尝试将JSON文件作为实体写入Google Cloud Datastore。

示例JSON:

{
"CustId": "005056B81111",
"Name": "John Smith", 
"Phone": "827188111",
"Email": "john@xxx.com", 
"addresses": [
    {"type": "Billing", "streetAddress": "Street 7", "city": "Malmo", "postalCode": "CR0 4UZ"},
    {"type": "Shipping", "streetAddress": "Street 6", "city": "Stockholm", "postalCode": "YYT IKO"}
]
}

我编写了一个Apache Beam管道,主要包括3个步骤,

  1. beam.io.ReadFromText(input_file_path)

  2. beam.ParDo(CreateEntities())

  3. WriteToDatastore(PROJECT)

在步骤2中,我将JSON对象(dict)转换为实体,

class CreateEntities(beam.DoFn):
  def process(self, element):
    element = element.encode('ascii','ignore')
    element = json.loads(element)
    Id = element.pop('CustId')
    entity = entity_pb2.Entity()
    datastore_helper.add_key_path(entity.key, 'CustomerDF', Id)
    datastore_helper.add_properties(entity, element)
    return [entity]

这对于基本属性很好用。但是,由于address是dict对象本身,因此它失败。 我读过类似的post

但是没有获得准确的代码来转换字典->实体

下面尝试将地址元素设置为实体,但不起作用,

element['addresses'] = entity_pb2.Entity()

其他参考文献:

2 个答案:

答案 0 :(得分:2)

您是否要将其存储为重复的结构化属性?

ndb.StructuredProperty出现在数据流中,并且键已展平,对于重复的结构化属性,结构化属性对象中的每个单个属性都将成为一个数组。所以我认为您需要这样写:

datastore_helper.add_properties(entity, {
    ...
    "addresses.type": ["Billing", "Shipping"],
    "addresses.streetAddress": ["Street 7", "Street 6"],
    "addresses.city": ["Malmo", "Stockholm"],
    "addresses.postalCode": ["CR0 4UZ", "YYT IKO"],
})

或者,如果您尝试将其另存为ndb.JsonProperty,则可以执行以下操作:

datastore_helper.add_properties(entity, {
        ...
        "addresses": json.dumps(element['addresses']),
    })

答案 1 :(得分:0)

我知道这是一个老问题,但是我遇到了类似的问题(尽管使用Python 3.6和NDB),并编写了一个函数将dicts中的所有dict转换为Entity。这将使用递归来遍历所有需要转换的节点:

def dict_to_entity(data):

    # the data can be a dict or a list, and they are iterated over differently
    # also create a new object to store the child objects
    if type(data) == dict:
        childiterator = data.items()
        new_data = {}
    elif type(data) == list:
        childiterator = enumerate(data)
        new_data = []
    else:
        return

    for i, child in childiterator:

        # if the child is a dict or a list, continue drilling...
        if type(child) in [dict, list]:
            new_child = dict_to_entity(child)
        else:
            new_child = child

        # add the child data to the new object
        if type(data) == dict:
            new_data[i] = new_child
        else:
            new_data.append(new_child)

    # convert the new object to Entity if needed
    if type(data) == dict:
        child_entity = datastore.Entity()
        child_entity.update(new_data)
        return child_entity
    else:
        return new_data