使用Apache Beam(Python 2.7 SDK)我正在尝试将JSON文件作为实体写入Google Cloud Datastore。
示例JSON:
{
"CustId": "005056B81111",
"Name": "John Smith",
"Phone": "827188111",
"Email": "john@xxx.com",
"addresses": [
{"type": "Billing", "streetAddress": "Street 7", "city": "Malmo", "postalCode": "CR0 4UZ"},
{"type": "Shipping", "streetAddress": "Street 6", "city": "Stockholm", "postalCode": "YYT IKO"}
]
}
我编写了一个Apache Beam管道,主要包括3个步骤,
beam.io.ReadFromText(input_file_path)
beam.ParDo(CreateEntities())
WriteToDatastore(PROJECT)
在步骤2中,我将JSON对象(dict)转换为实体,
class CreateEntities(beam.DoFn):
def process(self, element):
element = element.encode('ascii','ignore')
element = json.loads(element)
Id = element.pop('CustId')
entity = entity_pb2.Entity()
datastore_helper.add_key_path(entity.key, 'CustomerDF', Id)
datastore_helper.add_properties(entity, element)
return [entity]
这对于基本属性很好用。但是,由于address是dict对象本身,因此它失败。 我读过类似的post。
但是没有获得准确的代码来转换字典->实体
下面尝试将地址元素设置为实体,但不起作用,
element['addresses'] = entity_pb2.Entity()
其他参考文献:
答案 0 :(得分:2)
您是否要将其存储为重复的结构化属性?
ndb.StructuredProperty
出现在数据流中,并且键已展平,对于重复的结构化属性,结构化属性对象中的每个单个属性都将成为一个数组。所以我认为您需要这样写:
datastore_helper.add_properties(entity, {
...
"addresses.type": ["Billing", "Shipping"],
"addresses.streetAddress": ["Street 7", "Street 6"],
"addresses.city": ["Malmo", "Stockholm"],
"addresses.postalCode": ["CR0 4UZ", "YYT IKO"],
})
或者,如果您尝试将其另存为ndb.JsonProperty,则可以执行以下操作:
datastore_helper.add_properties(entity, {
...
"addresses": json.dumps(element['addresses']),
})
答案 1 :(得分:0)
我知道这是一个老问题,但是我遇到了类似的问题(尽管使用Python 3.6和NDB),并编写了一个函数将dicts
中的所有dict
转换为Entity
。这将使用递归来遍历所有需要转换的节点:
def dict_to_entity(data):
# the data can be a dict or a list, and they are iterated over differently
# also create a new object to store the child objects
if type(data) == dict:
childiterator = data.items()
new_data = {}
elif type(data) == list:
childiterator = enumerate(data)
new_data = []
else:
return
for i, child in childiterator:
# if the child is a dict or a list, continue drilling...
if type(child) in [dict, list]:
new_child = dict_to_entity(child)
else:
new_child = child
# add the child data to the new object
if type(data) == dict:
new_data[i] = new_child
else:
new_data.append(new_child)
# convert the new object to Entity if needed
if type(data) == dict:
child_entity = datastore.Entity()
child_entity.update(new_data)
return child_entity
else:
return new_data