我现在花了至少两个小时试图让它发挥作用。我在SO和Google群组中看到了很多不同的问题,但这些答案似乎都不适用于我。
问题:如何将数据(如下面的CSV文件中)批量上传到数据存储区,以创建在CSV文件中定义了key_name的实体(与使用下面的添加功能的结果相同)。
这是我的模特:
class RegisteredDomain(db.Model):
"""
Domain object class. It has no fields because it's existence is
proof that it has been registered. Indivdual registered domains
can be found using keys.
"""
pass
以下是我通常添加/删除域名等的方式:
def add(domains):
"""
Add domains. This functions accepts a single domain string or a
list of domain strings and adds them to the database. The domain(s)
must be valid unicode strings (a ValueError is thrown if the domain
strings are not valid.
"""
if not isinstance(domains, list):
domains = [domains]
cleaned_domains = []
for domain in domains:
clean_domain_ = clean_domain(domain)
is_valid_domain(clean_domain_)
cleaned_domains.append(clean_domain_)
domains = cleaned_domains
db.put([RegisteredDomain(key_name=make_key(domain)) for domain in domains])
def get(domains):
"""
Get domains. This function accepts a single domain string or a list
of domain strings and queries the database for them. It returns a
dictionary containing the domain name and RegisteredDomain object or
None if the entity was not found.
"""
if not isinstance(domains, list):
domains = [domains]
entities = db.get([Key.from_path('RegisteredDomain', make_key(domain)) for domain in domains])
return dict(zip(domains, entities))
注意:在上面的代码中,make_key只是将域名设为小写并添加“d”。
所以就是这样。现在我疯狂地尝试从CSV文件上传一些RegisteredDomain实体。这是CSV文件(注意第一个字符'd'是因为密钥名称可能不以数字开头):
key
dgoogle.com
dgoogle11.com
dfacebook.com
dcool.com
duuuuuuu.com
dsdsdsds.com
dffffooo.com
dgmail.com
我无法自动生成bulkloader yaml文件,因为应用引擎仍未更新我的数据存储统计信息(1天加上几个小时)。所以这(以及许多类似的排列)是我提出的(主要是改变import_transform位):
python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.api.datastore
- import: google.appengine.ext.db
- import: utils
- import: bulk_helper
transformers:
- kind: RegisteredDomain
connector: csv
connector_options:
encoding: utf-8
property_map:
- property: __key__
external_name: key
export_transform: bulk_helper.key_to_reverse_str
import_template: transform.create_foreign_key('RegisteredDomain')
现在出于某种原因,当我尝试上传时,表示一切正常,x实体已被转移等,但数据存储区中没有任何更新(我可以从管理控制台中看到)。我上传的方式如下:
appcfg.py upload_data --application=domain-sandwich --kind=RegisteredDomain --config_file=bulk.yaml --url=http://domain-sandwich.appspot.com/remote_api --filename=data.csv
最后这是我的数据存储区查看器的样子:
注意:我在开发服务器和appengine(无论什么工作......)都这样做。
感谢您的帮助!
答案 0 :(得分:0)
问题是appengine bulkloader(或数据存储区API)中的错误。我发布了一些关于此问题的问题(issue 1,issue 2,issue 3,issue 4),但以下是批量加载程序错误的文本供将来参考:
VERSION:
release: "1.5.2"
timestamp: 1308730906
api_versions: ['1']
批量加载程序不会导入没有属性的模型。例如:
class MetaObject(db.Model):
"""
Property-less object. Identified by application set key.
"""
pass
在应用程序中,您可以使用以下这些实体:
db.put([MetaObject(key_name=make_key(obj)) for obj in objs])
db.get([Key.from_path('MetaObject', make_key(obj)) for obj in objs])
db.delete([Key.from_path('MetaObject', make_key(obj)) for obj in objs])
现在,当我尝试使用bulkloader导入数据时出现问题。查看批量加载程序代码后,该错误最终出现在EncodeContent方法中(第1400-1406行):
1365 def EncodeContent(self, rows, loader=None):
1366 """Encodes row data to the wire format.
1367
1368 Args:
1369 rows: A list of pairs of a line number and a list of column values.
1370 loader: Used for dependency injection.
1371
1372 Returns:
1373 A list of datastore.Entity instances.
1374
1375 Raises:
1376 ConfigurationError: if no loader is defined for self.kind
1377 """
1378 if not loader:
1379 try:
1380 loader = Loader.RegisteredLoader(self.kind)
1381 except KeyError:
1382 logger.error('No Loader defined for kind %s.' % self.kind)
1383 raise ConfigurationError('No Loader defined for kind %s.' % self.kind)
1384 entities = []
1385 for line_number, values in rows:
1386 key = loader.generate_key(line_number, values)
1387 if isinstance(key, datastore.Key):
1388 parent = key.parent()
1389 key = key.name()
1390 else:
1391 parent = None
1392 entity = loader.create_entity(values, key_name=key, parent=parent)
1393
1394 def ToEntity(entity):
1395 if isinstance(entity, db.Model):
1396 return entity._populate_entity()
1397 else:
1398 return entity
1399
1400 if not entity:
1401
1402 continue
1403 if isinstance(entity, list):
1404 entities.extend(map(ToEntity, entity))
1405 elif entity:
1406 entities.append(ToEntity(entity))
1407
1408 return entities
因为(也会发布这个问题)数据存储区Entity对象子类dict而不覆盖非零或 len 方法不包含任何属性的实体,但确实有一个密钥,不会 为真(即使设置了密钥,“如果不是实体”也是如此),因此不会附加到实体上。
这是一个差异,它通过覆盖实体中的非零(任一个工作)来修复批量加载器中的这个:
--- bulkloader.py 2011-08-27 18:21:36.000000000 +0200
+++ bulkloader_fixed.py 2011-08-27 18:22:48.000000000 +0200
@@ -1397,12 +1397,9 @@
else:
return entity
- if not entity:
-
- continue
if isinstance(entity, list):
entities.extend(map(ToEntity, entity))
- elif entity:
+ else:
entities.append(ToEntity(entity))
return entities
--- datastore.py 2011-08-27 18:41:16.000000000 +0200
+++ datastore_fixed.py 2011-08-27 18:40:50.000000000 +0200
@@ -644,6 +644,12 @@
self.__key = Key._FromPb(ref)
+ def __nonzero__(self):
+ if len(self):
+ return True
+ if self.__key:
+ return True
+
def app(self):
"""Returns the name of the application that created this entity, a
string or None if not set.
发布错误报告:
问题1:http://code.google.com/p/googleappengine/issues/detail?id=5712
问题2:http://code.google.com/p/googleappengine/issues/detail?id=5713
问题3:http://code.google.com/p/googleappengine/issues/detail?id=5714
问题4:http://code.google.com/p/googleappengine/issues/detail?id=5715