Question

我是Python的初学者，已经在论坛上搜索了我的问题的答案但没有成功。

我有一个矩阵，想从另一列的数字中减去一列中的数字，然后用结果创建一个新列。

我试过了：

df['new column]=df['column 1']-df['column 2']

我的输出是：TypeError: unsupported operand type(s) for -: 'str' and 'str'

然后我尝试将这些列转换为整数，然后使用以下行执行减法：

df['column 2']=df['column 2'].astype(int)

我的输出是：ValueError: cannot convert float NaN to integer

（我的数据框中有一些NaN）。然后我尝试用所有的NaN替换使用以下代码的空字符串：

def remove_nan(s):
    import math
    """ remove np.nan"""
    if math.isnan(s) == True:
        s.replace( np.nan,"")
    else:
        return s

df['column 1'] = df.apply(remove_nan, axis=0)

我的输出是：TypeError：("cannot convert the series to <class 'float'>", 'occurred at index ID Number')

如果有人能够提供有关我在哪里犯错的见解，我将不胜感激。

感谢您的帮助。

Answer 1

使用pd.to_numeric转换为带参数errors='coerce'的数字，以便在nan不是数字时提供<{p}}

考虑df

df = pd.DataFrame(dict(A=list('456 8'), B=list('1 345')))

print(df)

   A  B
0  4  1
1  5   
2  6  3
3     4
4  8  5

pd.to_numeric之后

df = df.apply(pd.to_numeric, errors='coerce')

print(df)

     A    B
0  4.0  1.0
1  5.0  NaN
2  6.0  3.0
3  NaN  4.0
4  8.0  5.0

现在我们可以进行列数学

df['C'] = df.A - df.B

print(df)

     A    B    C
0  4.0  1.0  3.0
1  5.0  NaN  NaN
2  6.0  3.0  3.0
3  NaN  4.0  NaN
4  8.0  5.0  3.0

如果您想假设缺失值为零

df['C'] = df.A.sub(df.B, fill_value=0)

print(df)



    A    B    C
0  4.0  1.0  3.0
1  5.0  NaN  5.0
2  6.0  3.0  3.0
3  NaN  4.0 -4.0
4  8.0  5.0  3.0

从Python中的2个数据帧列中减去数字

1 个答案: