我正在尝试编码逻辑来过滤Pandas数据帧。我想将逻辑编码为字典,子组名称作为键,以及过滤子组作为值的函数:
analytics_table_mappings = {
"Jets Fans": BaseFilter.for_jets_fans,
"Patriots Fans": BaseFilter.for_patriots_fans,
...
}
我的BaseFilter.for_jets_fans
和BaseFilter.for_patriots_fans
是静态方法,包含为每组粉丝过滤我的数据帧的逻辑。
但是,我想创建一个函数BaseFilter.for_team_fans
,它接受一个team
字符串参数来指定要过滤哪个团队的粉丝。
我目前的尝试是编码类似的东西
analytics_table_mappings = {
"Jets Fans": {"func": BaseFilter.for_team_fans, "args": "Jets"},
"Patriots Fans": {"func": BaseFilter.for_team_fans, "args": "Patriots"},
...
}
我的问题:是否有更优雅,更少复杂,更易于维护的方式?对于上下文,我是一名数据科学家,这是我最终的大型模型的一部分需要交给我的工程团队维护和保养。他们要求我限制域特定语言(DSL)的数量,以帮助软化学习曲线并提高代码库的可维护性。我想用
"Jets Fans": {"func": BaseFilter.for_team_fans, "args": "Jets"},
"Patriots Fans": {"func": BaseFilter.for_team_fans, "args": "Patriots"},
有可能迅速演变成一个非常复杂且难以管理的DSL。我编码过滤逻辑的原因是因为我们过滤的度量标准类型以及我们如何过滤它们 - 可能会经常发展,所以我没有将它们硬编码到我的代码库中,而是将过滤器逻辑分离为单独的configurations.py
文件由字典组成(即analytics_table_mappings
)。因此,我希望在我的过滤器逻辑中保持灵活性,同时仍然让我的工程师可以维护它。
添加:
我还需要能够处理必须传递多个参数的实例。例如:
"Jets Fans": {"func": BaseFilter.for_team_fans, "args": "Jets"},
"Patriots Fans": {"func": BaseFilter.for_team_fans, "args": "Patriots"},
"NFC Fans": {"func": BaseFilter.for_team_fans, "args": ["Bears", "Packers", ...]}
答案 0 :(得分:2)
您可以考虑functools.partialmethod
,这可以指定任意数量的args
或kwargs
:
from functools import partialmethod
mappings = {'Jets Fans': partialmethod(BaseFilter.for_jets_fans, 'Jets'),
'Patriots Fans': partialmethod(BaseFilter.for_patriots_fans, 'Patriots'),
'NFC Fans': partialmethod(BaseFilter.for_team_fans, 'Bears', 'Packers')}
答案 1 :(得分:1)
如果BaseFilter.for_team_fans
是analytics_table_mappings
dict中每个条目的公共基函数,那么您可以将其分解。由于只留下一个属性,因此dict可以简化为简单的key: args
配对,例如
analytics_table_mappings = {
"Jets Fans": "Jets",
"Patriots Fans": "Patriots",
"NFC Fans": ["Bears", "Packers", ...]
}
然后可能将逻辑合并到一个简单的类中:
class Teams:
analytics_table_mappings = {
"Jets Fans": "Jets",
"Patriots Fans": "Patriots",
"NFC Fans": ["Bears", "Packers", ...]
}
@classmethod
def get_teams(cls, fan_type):
if fan_type not in cls.analytics_table_mappings:
return 'Invalid fan type: {}'.format(fan_type)
teams = cls.analytics_table_mappings[fan_type]
if not isinstance(teams, list):
teams = [teams]
return [cls.for_team_fans(team) for team in teams]
def for_team_fans(team_name):
# your logic here
return team_name
print(Teams().get_teams("Jets Fans"))
>> ['Jets']
print(Teams().get_teams("Patriots Fans"))
>> ['Patriots']
print(Teams().get_teams("NFC Fans"))
>> ['Bears', 'Packers', ...]
print(Teams().get_teams("Argonauts Fans"))
>> Invalid fan type: Argonauts Fans