通过设置key_name,AppEngine bulkloader上传实体

时间:2011-08-27 03:01:38

标签: google-app-engine google-cloud-datastore

我现在花了至少两个小时试图让它发挥作用。我在SO和Google群组中看到了很多不同的问题,但这些答案似乎都不适用于我。

问题:如何将数据(如下面的CSV文件中)批量上传到数据存储区,以创建在CSV文件中定义了key_name的实体(与使用下面的添加功能的结果相同)。

这是我的模特:

class RegisteredDomain(db.Model):
    """
    Domain object class. It has no fields because it's existence is
    proof that it has been registered. Indivdual registered domains
    can be found using keys.
    """
    pass

以下是我通常添加/删除域名等的方式:

def add(domains):
    """
    Add domains. This functions accepts a single domain string or a
    list of domain strings and adds them to the database. The domain(s)
    must be valid unicode strings (a ValueError is thrown if the domain
    strings are not valid.
    """
    if not isinstance(domains, list):
        domains = [domains]

    cleaned_domains = []
    for domain in domains:
        clean_domain_ = clean_domain(domain)
        is_valid_domain(clean_domain_)
        cleaned_domains.append(clean_domain_)

    domains = cleaned_domains

    db.put([RegisteredDomain(key_name=make_key(domain)) for domain in domains])


def get(domains):
    """
    Get domains. This function accepts a single domain string or a list
    of domain strings and queries the database for them. It returns a
    dictionary containing the domain name and RegisteredDomain object or
    None if the entity was not found.
    """
    if not isinstance(domains, list):
        domains = [domains]

    entities = db.get([Key.from_path('RegisteredDomain', make_key(domain)) for domain in domains])
    return dict(zip(domains, entities))

注意:在上面的代码中,make_key只是将域名设为小写并添加“d”。

所以就是这样。现在我疯狂地尝试从CSV文件上传一些RegisteredDomain实体。这是CSV文件(注意第一个字符'd'是因为密钥名称可能不以数字开头):

key
dgoogle.com
dgoogle11.com
dfacebook.com
dcool.com
duuuuuuu.com
dsdsdsds.com
dffffooo.com
dgmail.com

我无法自动生成bulkloader yaml文件,因为应用引擎仍未更新我的数据存储统计信息(1天加上几个小时)。所以这(以及许多类似的排列)是我提出的(主要是改变import_transform位):

python_preamble:
- import: google.appengine.ext.bulkload.transform
- import: google.appengine.api.datastore
- import: google.appengine.ext.db
- import: utils
- import: bulk_helper

transformers:
- kind: RegisteredDomain
  connector: csv
  connector_options:
    encoding: utf-8
  property_map:
    - property: __key__
      external_name: key
      export_transform: bulk_helper.key_to_reverse_str
      import_template: transform.create_foreign_key('RegisteredDomain')

现在出于某种原因,当我尝试上传时,表示一切正常,x实体已被转移等,但数据存储区中没有任何更新(我可以从管理控制台中看到)。我上传的方式如下:

appcfg.py upload_data --application=domain-sandwich --kind=RegisteredDomain --config_file=bulk.yaml --url=http://domain-sandwich.appspot.com/remote_api --filename=data.csv 

最后这是我的数据存储区查看器的样子: Datastore Viewer

注意:我在开发服务器和appengine(无论什么工作......)都这样做。

感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

问题是appengine bulkloader(或数据存储区API)中的错误。我发布了一些关于此问题的问题(issue 1issue 2issue 3issue 4),但以下是批量加载程序错误的文本供将来参考:

VERSION:
release: "1.5.2"
timestamp: 1308730906
api_versions: ['1']

批量加载程序不会导入没有属性的模型。例如:

class MetaObject(db.Model):
    """
    Property-less object. Identified by application set key.
    """
    pass

在应用程序中,您可以使用以下这些实体:

db.put([MetaObject(key_name=make_key(obj)) for obj in objs])
db.get([Key.from_path('MetaObject', make_key(obj)) for obj in objs])
db.delete([Key.from_path('MetaObject', make_key(obj)) for obj in objs])

现在,当我尝试使用bulkloader导入数据时出现问题。查看批量加载程序代码后,该错误最终出现在EncodeContent方法中(第1400-1406行):

1365   def EncodeContent(self, rows, loader=None):
1366     """Encodes row data to the wire format.
1367
1368     Args:
1369       rows: A list of pairs of a line number and a list of column values.
1370       loader: Used for dependency injection.
1371
1372     Returns:
1373       A list of datastore.Entity instances.
1374
1375     Raises:
1376       ConfigurationError: if no loader is defined for self.kind
1377     """
1378     if not loader:
1379       try:
1380         loader = Loader.RegisteredLoader(self.kind)
1381       except KeyError:
1382         logger.error('No Loader defined for kind %s.' % self.kind)
1383         raise ConfigurationError('No Loader defined for kind %s.' % self.kind)
1384     entities = []
1385     for line_number, values in rows:
1386       key = loader.generate_key(line_number, values)
1387       if isinstance(key, datastore.Key):
1388         parent = key.parent()
1389         key = key.name()
1390       else:
1391         parent = None
1392       entity = loader.create_entity(values, key_name=key, parent=parent)
1393
1394       def ToEntity(entity):
1395         if isinstance(entity, db.Model):
1396           return entity._populate_entity()
1397         else:
1398           return entity
1399
1400       if not entity:
1401
1402         continue
1403       if isinstance(entity, list):
1404         entities.extend(map(ToEntity, entity))
1405       elif entity:
1406         entities.append(ToEntity(entity))
1407
1408     return entities

因为(也会发布这个问题)数据存储区Entity对象子类dict而不覆盖非零 len 方法不包含任何属性的实体,但确实有一个密钥,不会 为真(即使设置了密钥,“如果不是实体”也是如此),因此不会附加到实体上。

这是一个差异,它通过覆盖实体中的非零(任一个工作)来修复批量加载器中的这个:

--- bulkloader.py       2011-08-27 18:21:36.000000000 +0200
+++ bulkloader_fixed.py 2011-08-27 18:22:48.000000000 +0200
@@ -1397,12 +1397,9 @@
         else:
           return entity

-      if not entity:
-
-        continue
       if isinstance(entity, list):
         entities.extend(map(ToEntity, entity))
-      elif entity:
+      else:
         entities.append(ToEntity(entity))

     return entities
--- datastore.py        2011-08-27 18:41:16.000000000 +0200
+++ datastore_fixed.py  2011-08-27 18:40:50.000000000 +0200
@@ -644,6 +644,12 @@

     self.__key = Key._FromPb(ref)

+  def __nonzero__(self):
+      if len(self):
+          return True
+      if self.__key:
+          return True
+
   def app(self):
     """Returns the name of the application that created this entity, a
     string or None if not set.

发布错误报告:

  

问题1:http://code.google.com/p/googleappengine/issues/detail?id=5712

     

问题2:http://code.google.com/p/googleappengine/issues/detail?id=5713

     

问题3:http://code.google.com/p/googleappengine/issues/detail?id=5714

     

问题4:http://code.google.com/p/googleappengine/issues/detail?id=5715