如果列表中的值存在于pandas dataframe列之一中,则我需要遍历列表并执行特定操作。我尝试执行以下操作,但出现错误
'错误:#系列的真值不明确。使用a.empty,a.bool(),a.item(),a.any()或a.all()。'
import pandas as pd
people = {
'fname':['Alex','Jane','John'],
'age':[20,15,25],
'sal':[100,200,300]
}
df=pd.DataFrame(people)
check_list=['Alex','John']
for column in check_list:
if (column == df['fname']):
df['new_column']=df['sal']/df['age']
else:
df['new_column']=df['sal']
df
必需的输出:
fname age sal new_column
Alex 20 100 5 <<-- sal/age
Jane 15 200 200 <<-- sal as it is
John 25 300 12 <<-- sal/age
答案 0 :(得分:4)
使用np.where
和.isin
来检查一列是否包含特定值。
df['new_column'] = np.where(
df['fname'].isin(['Alex','John']),
df['sal']/df['age'],
df['sal']
)
print(df)
fname age sal new_column
0 Alex 20 100 5.0
1 Jane 15 200 200.0
2 John 25 300 12.0
纯熊猫版本。
df['new_column'] = (df['sal']/df['age']).where(
df['fname'].isin(['Alex','John']),other=df['sal'])
print(df)
fname age sal new_col
0 Alex 20 100 5.0
1 Jane 15 200 200.0
2 John 25 300 12.0
答案 1 :(得分:1)
尝试使用df.apply
import pandas as pd
people = {
'fname':['Alex','Jane','John'],
'age':[20,15,25],
'sal':[100,200,300]
}
df=pd.DataFrame(people)
def checker(item):
check_list=['Alex','John']
if item["fname"] in check_list:
return item['sal']/item['age']
else:
return item['sal']
df["Exists"] = df.apply(checker, axis=1)
df
答案 2 :(得分:1)
for index,row in df.iterrows():
if row['fname'] in check_list:
df.at[index,'new_column']=row['sal']/row['age']
else:
df.at[index,'new_column']=row['sal']
说明:要遍历数据框,请使用iterrows(),行变量将具有所有列的值,索引是行的索引。