如果我有数据框:
left
所需的输出为;
A B C
0.0285714285714285 4 0.11428571
0.107142857142857 4 0.42857143
0.007142857142857 6 0.04285714
1.2 4 5.5
1.5 3 3
我想忽略类似的3行,因为差异很小。仅应包含逗号后的第一个数字。
您能帮我吗?
答案 0 :(得分:1)
使用np.where
检查结果是否足够显着:
df["difference"] = np.where((df["A"]*df["B"]-df["C"]>=0.1)|(df["A"]*df["B"]-df["C"]<=-0.1),df["A"]*df["B"]-df["C"],0)
print (df)
#
A B C difference
0 0.028571 4 0.114286 0.0
1 0.107143 4 0.428571 0.0
2 0.007143 6 0.042857 0.0
3 1.200000 4 5.500000 -0.7
4 1.500000 3 3.000000 1.5
答案 1 :(得分:1)
编辑:
因为列A
中的值是对象(显然是字符串):
df['A'] = df['A'].astype(float)
如果不起作用,因为错误的值(例如某些字符串)-错误的值将由NaN
代替:
df['A'] = pd.to_numeric(df['A'], errors='coerce')
使用Series.mask
通过Series.between
通过条件设置新列:
#multiple columns
df['A*B'] = df["A"]*df["B"]
#subtract to Series
diff = df['A*B'] - df['C']
#create mask
mask = diff.between(-0.1, 0.1)
df["difference"] = diff.mask(mask, 0)
print (df)
A B C A*B difference
0 0.028571 4 0.114286 0.114286 0.0
1 0.107143 4 0.428571 0.428571 0.0
2 0.007143 6 0.042857 0.042857 0.0
3 1.200000 4 5.500000 4.800000 -0.7
4 1.500000 3 3.000000 4.500000 1.5
print (f'Count: {(~mask).sum()}')
Count: 2
如果顺序很重要,请在提取列中添加DataFrame.insert
和DataFrame.pop
:
df.insert(0, 'A*B', df.pop("A")*df.pop("B"))
diff = df['A*B'] - df['C']
mask = diff.between(-0.1, 0.1)
df["difference"] = diff.mask(mask, 0)
print (df)
A*B C difference
0 0.114286 0.114286 0.0
1 0.428571 0.428571 0.0
2 0.042857 0.042857 0.0
3 4.800000 5.500000 -0.7
4 4.500000 3.000000 1.5
print (f'Count: {(~mask).sum()}')
Count: 2