当前,以下代码以如下方式动态创建查询:-
代码:
zip_cols = list(zip(['name','address'],
['name_1','address_1']))
self.matches = self.features[
(
[
reduce(
lambda x, y: x + y,
[self.features[a + "_" + c[0] + "_" + c[1]] for a in self._algos],
)
for c in zip_cols
][0]
> (self.input_args.get('threshold', 0.7) * 4)
)
& (
[
reduce(
lambda x, y: x + y,
[self.features[a + "_" + c[0] + "_" + c[1]] for a in self._algos],
)
for c in zip_cols
][1]
> (self.input_args.get('threshold', 0.7) * 4)
)].copy()
查询:
matches = features[(
(
(features['fw_name_name_1'] / 100)
+ features['sw_name_name_1']
+ features['jw_name_name_1']
+ features['co_name_name_1']
) > 2.8
)
&
(
(
(features['fw_address_address_1'] / 100)
+ features['sw_address_address_1']
+ features['jw_address_address_1']
+ features['co_address_address_1']
) > 2.8
)
].copy()
但是,如果source_compare_names中有2列且1个或2个以上失败,则此查询有效。
答案 0 :(得分:0)
有了最少的输入和上下文,我就会开始学习。这样的想法是,您可以动态地将过滤条件建立为字符串,将其加入并进行评估。
threshold = self.input_args.get('threshold', 0.7) * 4
column_selection = [reduce(lambda x, y: x + y,
[self.features[a + "_" + c[0] + "_" + c[1]] for a in self._algos]) for c in zip_cols]
size = 10 # number of items you need
total_filter_list = []
for i in range(size):
# build the filter columns as list of strings
total_filter_list.append(f'(column_selection[{i}] > {threshold})')
# join the list of strings with '&', build the total filter criteria as string
total_filter_string = ' & '.join(total_filter_list)
# evaluate the filter
self.features[eval(total_filter_string)]