这是Web抓取项目中使用的python类内部的片段。我想迭代一个新删除数据的字典,将它与每个级别的先前删除的索引进行比较,并将需要更新的值添加到另一个深度嵌套的dict中以供稍后处理。我可以使用哪些策略来清理它并仍然可以获得相同的结果?
self.new_stats[tour] = {}
parsed_stats = parse_stat_year(CURRENT_STAT_YEAR, self.scraped_stats_index[tour])
for pname, stats_by_year in parsed_stats.items():
if pname in self.raw_players_with_stats[tour]:
player = self.raw_players_with_stats[tour][pname]
if 'stats' in player:
for y, stats_by_cat in stats_by_year.items():
if str(y) in player['stats']:
for cat, stat in stats_by_cat.items():
if cat in player['stats'][str(y)]:
for prop, val in stat.items():
if (not prop in player['stats'][str(y)][cat]) or (player['stats'][str(y)][cat][prop] != val):
self.new_stats[tour].setdefault(pname,{}).setdefault(y,{}).setdefault(cat,{})[prop] = val
else:
self.new_stats[tour].setdefault(pname,{}).setdefault(y,{})[cat] = stat
else:
self.new_stats[tour].setdefault(pname,{})[y] = stats_by_cat
else:
self.new_stats[tour][pname] = stats_by_year
elif pname in self.new_player_urls[tour]:
self.new_stats[tour][pname] = stats_by_year
答案 0 :(得分:2)
我将从unit test开始,以确保在每次重构迭代后,我的代码仍然可以正常工作。
我会使用有意义的数据结构和方法,因此代码更多self-describing。如果您不想推出单独的数据持有者类,有时您会发现namedtuple非常有用。
最后,我会将这个大而丑陋的if...for...else
块分解为有意义的小块,如下所示:
# instead of this original code...
for pname, stats_by_year in parsed_stats.items():
if pname in self.raw_players_with_stats[tour]:
#...
elif pname in self.new_player_urls[tour]:
self.new_stats[tour][pname] = stats_by_year
# you get something like this
for player_name, stats_by_year in parser_stats.iteritems():
if self.has_raw_player(player_name):
self.process_new_raw_player(player_name, stats_by_year)
elif self.is_player_new(player_name):
self.insert_new_stat_for_player( player_name, stats_by_year )
更容易阅读,测试和理解
而且,如果你有空闲时间,我会把它投入阅读Clean Code by Robert Martin。它肯定会得到回报!
修改强>
清理冗长且难以阅读的单行
#...
self.new_stats[tour].setdefault(pname,{}).setdefault(y,{}).setdefault(cat,{})[prop] = val
#...
所以看起来像这样:
def insert_new_stat(self, tour, pname, y, cat, prop, val):
player_stat = self.new_stats[tour].setdefault(pname, {})
y_param = player_stat.setdefault(y, {}) # what is y??
category_stats = ...
prop_stats = ...
... = val
,但您的代码肯定会更冗长更详细