我有一个要在Python 3.7
中的熊猫中转换的csv文件。然后,我想检查某些单元格是否具有NaN
(即我的情况下为空),仅在这种情况下,我要用另一个值替换单元格的内容。
我正在选择具有同一行中其他列(列family_name
和first_name
)中其他单元格内的值的单元格。这是MWE
:
import csv
import pandas as pd
import numpy as np
df = pd.DataFrame({"family_name":["smith", "duboule", "dupont"], "first_name":["john","jean-paul", "luc"], "weight":[70, 85, pd.np.nan]})
value_to_replace = 90
if df["weight"][(df["family_name"] == "dupont") & (df["first_name"] == "luc")] == pd.np.nan:
df["weight"][(df["family_name"] == "dupont") & (df["first_name"] == "luc")] = value_to_replace
我收到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/mymac/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 1576, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我还尝试了以以下形式添加.bool() == True
,但收到了相同的错误消息:
if pd.isna(df["weight"][(df["family_name"] == family_name) & (df["first_name"] == first_name)]).bool() == True:
df["weight"][(df["family_name"] == "dupont") & (df["first_name"] == "luc")] = value_to_replace
答案 0 :(得分:2)
使用np.where
工作方式如下:np.where(condition, true value, false value)
df['weight'] = np.where((df.family_name == 'dupont') & (df.first_name == 'luc'), value_to_replace, df.weight)
print(df)
family_name first_name weight
0 smith john 70.0
1 duboule jean-paul 85.0
2 dupont luc 90.0
在OP评论后进行编辑
仅当权重为NaN
时,您才能使用.isnull
:
df['weight'] = np.where((df.family_name == 'dupont') & (df.first_name == 'luc') & (df.weight.isnull()), value_to_replace, df.weight)
答案 1 :(得分:1)
删除所有if语句,并使用它
df.loc[ (df["family_name"] == "dupont") &
(df["first_name"] == "luc") &
(df["weight"].isnull()), 'weight'] = value_to_replace
我建议您阅读pandas API,以了解loc
如何选择/编辑数据