Question

以下是我的代码。我正在尝试解析DataFrame并存储公司匹配项。但是，if语句始终返回true，并且所有内容都保存在数据框current_customers中，即使我的150行中有大约10个具有值＆gt; 97.我的代码下面是我的数据样本。

current_customers = pandas.DataFrame()
potential_customers = pandas.DataFrame()
for i in range(0, len(FDA_useful_companies_bing)):
    if combined_data['match token sort'].iloc[i] or combined_data['match ratio'].iloc[i] or combined_data['match partial ratio'].iloc[i] > 97:
        current_customers = current_customers.append(combined_data.ix[i,4::])
    else:
        potential_customers = potential_customers.append(combined_data.ix[i,4::])

我的数据样本

Company                             City            State       ZIP     FDA Company                 FDA City            FDA State   FDA ZIP Token sort ratio              match token sort  Ratio                           match ratio    Partial Ratio            match partial ratio
NOVARTIS                            Larchwood       IA          51241   HELGET GAS PRODUCTS INC     Kansas City         MO          64116   AIR PRODUCTS  CHEMICALS INC   73                OCEANIC MEDICAL PRODUCTS INC    59             LUCAS INC                78
BOEHRINGER INGELHEIM VETMEDICA INC  Sioux Center    IA          51250   SOUTHWEST TECHNOLOGIES INC  North Kansas City   MO          64116   SOUTHWEST TECHNOLOGIES        100               SOUTHWEST TECHNOLOGIES          92             SOUTHWEST TECHNOLOGIES   100

编辑：此外，如果有更有效的方法来做到这一点，我很乐意听到。

Answer 1

您可以做的IIUC：

current_customer = combined_data[(combined_data[['match token sort','match ratio','match partial ratio']] > 97).any(axis=1)]

potential_customer = combined_data[(combined_data[['match token sort','match ratio','match partial ratio']] <= 97).all(axis=1)]

您尝试短路的原因是任何非零值都会评估为True，因为它没有按照您的预期将所有条件与最后一个数值进行比较：

if combined_data['match token sort'].iloc[i] or combined_data['match ratio'].iloc[i] or combined_data['match partial ratio'].iloc[i] > 97:

所以这相当于：

if some_val or another_val or last_val > 95

所以这里如果some_val为非零或another_val为non_zero，则整个语句的计算结果为True

您可以在简化的案例中看到这一点：

In [83]:
x = 1    
if 5 or x > 95:
    print('True')
else:
    print('False')

此输出：

True

只需一次比较：

In [85]:
if 5 > 95:
    print('True')
else:
    print('False')

输出：

False

但每个值与目标值进行比较：

In [87]:
x=1
if 5 > 95 or x > 95:
    print('True')
else:
    print('False')

现在打印：

False

但这里的真正要点是根本不循环，您可以通过传递感兴趣的列表列表从您的df中进行子选择，然后您可以将整个df与您的标量值进行比较并使用{{1}生成布尔掩码并使用它来掩盖df以返回当前客户，然后反转比较并使用any(axis=1)找到没有cols满足您之前比较的行来过滤df潜在客户

Answer 2

您的问题是if语句，如您所怀疑的那样：

if combined_data['match token sort'].iloc[i] or combined_data['match ratio'].iloc[i] or combined_data['match partial ratio'].iloc[i] > 97:

您在询问表达式＆＃34; combined_data [＆＃39;匹配令牌排序＆＃39;]。iloc [i]＆＃34;是真的，它是一个数字＆gt; 0，所以根据Python它是一个truthey值。因此，整个表达式返回True。

我将添加括号以更清楚地说明Python如何解释这行代码：

if (combined_data['match token sort'].iloc[i]) or 
    (combined_data['match ratio'].iloc[i]) or 
    (combined_data['match partial ratio'].iloc[i] > 97):

Python正在单独评估括号中的语句和Python considers any non-zero number to be a "truthey" value，因此用作条件时它返回True。这是一个更正的表达式：

if (combined_data['match token sort'].iloc[i]) > 97 or 
        (combined_data['match ratio'].iloc[i]) > 97 or 
        (combined_data['match partial ratio'].iloc[i] > 97):

现在Python将每个操作作为您想要的比较操作。

复杂的if语句返回true

2 个答案: