Python计算两列中的不同值

时间:2019-10-22 07:16:27

标签: python pandas dataframe

如果我有数据框:

left

所需的输出为;

A                         B         C
0.0285714285714285        4         0.11428571
0.107142857142857         4         0.42857143
0.007142857142857         6         0.04285714
1.2                       4         5.5
1.5                       3         3

我想忽略类似的3行,因为差异很小。仅应包含逗号后的第一个数字。

您能帮我吗?

2 个答案:

答案 0 :(得分:1)

使用np.where检查结果是否足够显着:

df["difference"] = np.where((df["A"]*df["B"]-df["C"]>=0.1)|(df["A"]*df["B"]-df["C"]<=-0.1),df["A"]*df["B"]-df["C"],0)

print (df)

#
          A  B         C  difference
0  0.028571  4  0.114286         0.0
1  0.107143  4  0.428571         0.0
2  0.007143  6  0.042857         0.0
3  1.200000  4  5.500000        -0.7
4  1.500000  3  3.000000         1.5

答案 1 :(得分:1)

编辑:

因为列A中的值是对象(显然是字符串):

df['A'] = df['A'].astype(float)

如果不起作用,因为错误的值(例如某些字符串)-错误的值将由NaN代替:

df['A'] = pd.to_numeric(df['A'], errors='coerce')

使用Series.mask通过Series.between通过条件设置新列:

#multiple columns
df['A*B'] = df["A"]*df["B"]
#subtract to Series
diff = df['A*B'] - df['C']
#create mask
mask = diff.between(-0.1, 0.1)

df["difference"] = diff.mask(mask, 0)
print (df)
          A  B         C       A*B  difference
0  0.028571  4  0.114286  0.114286         0.0
1  0.107143  4  0.428571  0.428571         0.0
2  0.007143  6  0.042857  0.042857         0.0
3  1.200000  4  5.500000  4.800000        -0.7
4  1.500000  3  3.000000  4.500000         1.5

print (f'Count: {(~mask).sum()}')
Count: 2

如果顺序很重要,请在提取列中添加DataFrame.insertDataFrame.pop

df.insert(0, 'A*B',  df.pop("A")*df.pop("B"))
diff = df['A*B'] - df['C']
mask = diff.between(-0.1, 0.1)

df["difference"] = diff.mask(mask, 0)
print (df)
        A*B         C  difference
0  0.114286  0.114286         0.0
1  0.428571  0.428571         0.0
2  0.042857  0.042857         0.0
3  4.800000  5.500000        -0.7
4  4.500000  3.000000         1.5


print (f'Count: {(~mask).sum()}')
Count: 2