如何清理深层嵌套字典访问?

时间:2013-01-01 11:16:57

标签: python refactoring nested

这是Web抓取项目中使用的python类内部的片段。我想迭代一个新删除数据的字典,将它与每个级别的先前删除的索引进行比较,并将需要更新的值添加到另一个深度嵌套的dict中以供稍后处理。我可以使用哪些策略来清理它并仍然可以获得相同的结果?

self.new_stats[tour] = {}
parsed_stats = parse_stat_year(CURRENT_STAT_YEAR, self.scraped_stats_index[tour])

for pname, stats_by_year in parsed_stats.items():
  if pname in self.raw_players_with_stats[tour]:
    player = self.raw_players_with_stats[tour][pname]

    if 'stats' in player:
      for y, stats_by_cat in stats_by_year.items():
        if str(y) in player['stats']:
          for cat, stat in stats_by_cat.items():
            if cat in player['stats'][str(y)]:
              for prop, val in stat.items():
                if (not prop in player['stats'][str(y)][cat]) or (player['stats'][str(y)][cat][prop] != val):
                  self.new_stats[tour].setdefault(pname,{}).setdefault(y,{}).setdefault(cat,{})[prop] = val
            else:
              self.new_stats[tour].setdefault(pname,{}).setdefault(y,{})[cat] = stat
        else:
          self.new_stats[tour].setdefault(pname,{})[y] = stats_by_cat
    else:
      self.new_stats[tour][pname] = stats_by_year

  elif pname in self.new_player_urls[tour]:
    self.new_stats[tour][pname] = stats_by_year

1 个答案:

答案 0 :(得分:2)

我将从unit test开始,以确保在每次重构迭代后,我的代码仍然可以正常工作。

我会使用有意义的数据结构和方法,因此代码更多self-describing。如果您不想推出单独的数据持有者类,有时您会发现namedtuple非常有用。

最后,我会将这个大而丑陋的if...for...else块分解为有意义的小块,如下所示:

# instead of this original code...

for pname, stats_by_year in parsed_stats.items():
  if pname in self.raw_players_with_stats[tour]:
    #...
  elif pname in self.new_player_urls[tour]:
    self.new_stats[tour][pname] = stats_by_year

# you get something like this

for player_name, stats_by_year in parser_stats.iteritems():
  if self.has_raw_player(player_name):
    self.process_new_raw_player(player_name, stats_by_year)
  elif self.is_player_new(player_name):
     self.insert_new_stat_for_player( player_name, stats_by_year )

更容易阅读,测试和理解

而且,如果你有空闲时间,我会把它投入阅读Clean Code by Robert Martin。它肯定会得到回报!

修改

清理冗长且难以阅读的单行

#...
self.new_stats[tour].setdefault(pname,{}).setdefault(y,{}).setdefault(cat,{})[prop] = val
#...

所以看起来像这样:

def insert_new_stat(self, tour, pname, y, cat, prop, val):
  player_stat = self.new_stats[tour].setdefault(pname, {})
  y_param = player_stat.setdefault(y, {}) # what is y??
  category_stats = ...
  prop_stats = ...
  ... = val

虽然Explicit is better than implicit

,但您的代码肯定会更冗长更详细