Python数据比较

时间:2018-09-24 15:44:11

标签: python pandas

我正在尝试运行数据检查,比较数据框内的列并返回%差异;但是,如果没有得到以下ValueError,我将无法运行以下代码的条件方面:('Series的真值不明确。请使用a.empty,a.bool(),a.item(),a。 any()或a.all()。”,“出现在索引项编号上”)

def checks(df):
    if Multi['Masterpack qty'] == Multi['SUBPACK_QTY']:
        Multi['Length Difference'] = abs((Multi['Length']-Multi['SUBPACK_LENGTH'])/((Multi['Length']+Multi['SUBPACK_LENGTH'])/2))
        Multi['Height Difference'] = abs((Multi['Height']-Multi['SUBPACK_HEIGHT'])/((Multi['Height']+Multi['SUBPACK_HEIGHT'])/2))
        Multi['Width Difference'] = abs((Multi['Width']-Multi['SUBPACK_WIDTH'])/((Multi['Width']+Multi['SUBPACK_WIDTH'])/2))
        Multi['Weight Difference'] = abs((Multi['Weight']-Multi['SUBPACK_WEIGHT'])/((Multi['Weight']+Multi['SUBPACK_WEIGHT'])/2))
    elif Multi['Masterpack qty'] == Multi['PACK_QTY']:
        Multi['Length Difference'] = abs((Multi['Length']-Multi['PACK_LENGTH'])/((Multi['Length']+Multi['PACK_LENGTH'])/2))
        Multi['Height Difference'] = abs((Multi['Height']-Multi['PACK_HEIGHT'])/((Multi['Height']+Multi['PACK_HEIGHT'])/2))
        Multi['Width Difference'] = abs((Multi['Width']-Multi['PACK_WIDTH'])/((Multi['Width']+Multi['PACK_WIDTH'])/2))
        Multi['Weight Difference'] = abs((Multi['Weight']-Multi['PACK_WEIGHT'])/((Multi['Weight']+Multi['PACK_WEIGHT'])/2))
    else:
        Multi['Length Difference'] = 'No Match'
        Multi['Height Difference'] = 'No Match'
        Multi['Width Difference'] = 'No Match'
        Multi['Weight Difference'] = 'No Match' 

Multi.apply(checks)

1 个答案:

答案 0 :(得分:1)

您的代码存在多个问题。主要问题是:

  1. 您假设ifabs以矢量方式工作。这不是真的常规Python中的每个if语句都使用单个布尔值,而不是在比较两个序列时逐个元素地进行计算。
  2. 由于(1),您将连续覆盖if / elif / else子句中的每个子句。
  3. 您的函数不返回任何内容。因此,使用Multi.apply(checks)将返回None

相反,您可以使用np.select以矢量化的方式指定条件和值。这是Length Difference的示例:

conds = [df['Masterpack qty'] == df['SUBPACK_QTY'], df['Masterpack qty'] == df['PACK_QTY']]
choices = [((df['Length'] - df['SUBPACK_LENGTH']) / ((df['Length'] + df['SUBPACK_LENGTH'])/2)).abs(),
           ((df['Length'] - df['PACK_LENGTH']) / ((df['Length'] + df['PACK_LENGTH'])/2)).abs()]

df['Length Difference'] = np.select(conds, choices, 'No Match')