如何优化递归函数

时间:2017-07-22 18:44:06

标签: python

我目前有一个函数应该更新每个字典键,其中值为旧的_id是新uuid的值。

此功能按预期工作,但效率不高,使用我当前的数据集执行时间太长。

发生了什么:

  1. 每个dict['_id']['$oid']都必须更改为新的uuid。
  2. 与旧值匹配的每个$oid必须更改为新的uuid。
  3. 问题是每个字典中都有很多$oid个键,我想确保它在正常工作时更有效

    截至目前,脚本需要45-55秒才能使用我当前的数据集执行,并且它正在对递归函数recursive_id_replace进行4500万次调用

    如何优化当前函数以使此脚本更快地运行?

    数据作为字典传递到此函数中,如下所示:

    file_data = {
        'some_file_name.json': [{...}],
        # ...
    }
    

    以下是数据集中某个词典的示例:

    [
      {
        "_id": {
          "$oid": "4e0286ed2f1d40f78a037c41"
        },
        "code": "HOME",
        "name": "Home Address",
        "description": "Home Address",
        "entity_id": {
          "$oid": "58f4eb19736c128b640d5844"
        },
        "is_master": true,
        "is_active": true,
        "company_id": {
          "$oid": "591082232801952100d9f0c7"
        },
        "created_at": {
          "$date": "2017-05-08T14:35:19.958Z"
        },
        "created_by": {
          "$oid": "590b7fd32801952100d9f051"
        },
        "updated_at": {
          "$date": "2017-05-08T14:35:19.958Z"
        },
        "updated_by": {
          "$oid": "590b7fd32801952100d9f051"
        }
      },
      {
        "_id": {
          "$oid": "01c593700e704f29be1e3e23"
        },
        "code": "MAIL",
        "name": "Mailing Address",
        "description": "Mailing Address",
        "entity_id": {
          "$oid": "58f4eb1b736c128b640d5845"
        },
        "is_master": true,
        "is_active": true,
        "company_id": {
          "$oid": "591082232801952100d9f0c7"
        },
        "created_at": {
          "$date": "2017-05-08T14:35:19.980Z"
        },
        "created_by": {
          "$oid": "590b7fd32801952100d9f051"
        },
        "updated_at": {
          "$date": "2017-05-08T14:35:19.980Z"
        },
        "updated_by": {
          "$oid": "590b7fd32801952100d9f051"
        }
      }
    ]
    

    继承人的职能:

    def id_replace(_ids: set, uuids: list, file_data: dict) -> dict:
        """ Replaces all old _id's with new uuid. 
            checks all data keys for old referenced _id and updates it to new uuid value.
        """
    
        def recursive_id_replace(prev_val, new_val, data: Any):
            """
            Replaces _ids recursively.
            """
            data_type = type(data)
    
            if data_type == list:
                for item in data:
                    recursive_id_replace(prev_val, new_val, item)
    
            if data_type == dict:
                for key, val in data.items():
    
                    val_type = type(val)
    
                    if key == '$oid' and val == prev_val:
                        data[key] = new_val
    
                    elif val_type == dict:
                        recursive_id_replace(prev_val, new_val, val)
    
                    elif val_type == list:
                        for item in val:
                            recursive_id_replace(prev_val, new_val, item)
    
        for i, _id in enumerate(_ids):
            recursive_id_replace(_id, uuids[i], file_data)
    
        return file_data
    

1 个答案:

答案 0 :(得分:1)

这是一种涉及使用json然后进行替换的数据字符串化的技术。

In [415]: old_uid = "590b7fd32801952100d9f051"

In [416]: new_uid = "test123"

In [417]: json.loads(json.dumps(data).replace('"$oid": "%s"' %old_uid, '"$oid": "%s"' %new_uid))
Out[417]: 
[{'_id': {'$oid': '4e0286ed2f1d40f78a037c41'},
  'code': 'HOME',
  'company_id': {'$oid': '591082232801952100d9f0c7'},
  'created_at': {'$date': '2017-05-08T14:35:19.958Z'},
  'created_by': {'$oid': 'test123'},
  'description': 'Home Address',
  'entity_id': {'$oid': '58f4eb19736c128b640d5844'},
  'is_active': True,
  'is_master': True,
  'name': 'Home Address',
  'updated_at': {'$date': '2017-05-08T14:35:19.958Z'},
  'updated_by': {'$oid': 'test123'}},
 {'_id': {'$oid': '01c593700e704f29be1e3e23'},
  'code': 'MAIL',
  'company_id': {'$oid': '591082232801952100d9f0c7'},
  'created_at': {'$date': '2017-05-08T14:35:19.980Z'},
  'created_by': {'$oid': 'test123'},
  'description': 'Mailing Address',
  'entity_id': {'$oid': '58f4eb1b736c128b640d5845'},
  'is_active': True,
  'is_master': True,
  'name': 'Mailing Address',
  'updated_at': {'$date': '2017-05-08T14:35:19.980Z'},
  'updated_by': {'$oid': 'test123'}}]

请记住,只适用于列表和词典。其他数据结构将被隐式转换或抛出错误。

更多方法here