Question

我正在使用pythons zip方法将pandas数据框操作应用于多个列。该代码可以正常工作，但是所有生成的元组都打印到我的终端上，这使得调试变得很困难。

从函数中提取：

import fuzzywuzzy as fuzz

if 'leven_dist_N' not in self.clust_df.columns:
    self.clust_df['leven_dist_N'], self.clust_df['leven_dist_NA'] = zip(
        *self.clust_df.apply(self.calcMatchRatio, axis=1))

应用功能：

 def calcMatchRatio(self, row):

        if pd.notnull(row.src_name_short) and pd.notnull(row.reg_name_short):
            if pd.notnull(row.src_address_adj) and pd.notnull(row.reg_address_adj):
                return int(fuzz.ratio(row.src_name_short, row.reg_name_short)), int(fuzz.ratio(row.src_joinfields, row.reg_joinfields))
            else:
                return int(fuzz.ratio(row.src_name_short, row.reg_name_short)), int(0)

当calcMatchRatio中的第一个return语句被触发时，结果将被打印到控制台，而无需任何其他步骤。

对象中的值只是包含公司名称的字符串。终端输出示例：

（“切尔西安普威斯敏斯特医院和皇家马斯登nfts，”）

（'切尔西安普威斯敏斯特医院和皇家马斯登NFTS海港   码伦敦sw10 0xd英国'，''）

fuzz.ratio行计算每个元组中两个字符串之间的Levenshtein距离（整数），但是这些字符串仍会打印。

当我运行一个单独的python实例时：

python3

将Fuzzywuzzy导入为绒毛

fuzz.ratio（'lalala'，'lololo'）

输出

（'lalala'，'lololo'）

60

所以我认为问题出在Fuzzywuzzy包中-我真的不想去修改那里的代码，但是我似乎还记得在使用多重Apply＆zip组合（预重构）之前，没有此类垃圾邮件。

恼人的终端输出

0 个答案: