我有一个数据框,如下所示。 Dataframe start
我想查看D,F,M,P列,并返回结果列,该列的值在每一行中显示最多。
我要确保遵循以下规则的规则是:
1)如果包含2个IG和2个HY的行之间有分隔,请在Result列中返回HY。
2)如果列包含NaN值,请忽略它并使用其他可用值。
我希望生成的数据框看起来像: Result_DF
df_Start = pd.DataFrame({'P':['IG','HY','IG',np.nan,'HY'], 'M':['HY','HY','IG', np.nan,'IG'], 'F':['HY',np.nan,'HY', np.nan,'IG'],'D':['IG','IG','IG', 'HY','IG']})
df_end = pd.DataFrame({'Result':['HY','HY','IG', 'HY','IG'],'P':['IG','HY','IG',np.nan,'HY'], 'M':['HY','HY','IG', np.nan,'IG'], 'F':['HY',np.nan,'HY', np.nan,'IG'],'D':['IG','IG','IG', 'HY','IG']})
def f(x):
frequencies = pd.Series(data=[y for y in x if pd.isnull(y)==False]).value_counts()
a,b,c = 0,0,0
if 'IG' in frequencies:
b = frequencies['IG']
if 'HY' in frequencies:
a = frequencies['HY']
if 'PFA' in frequencies:
c = frequencies['PFA']
return 'PFA' if c > 0 elif
对于我,在new_df.iterrows()中行: new_df.loc [i,'result'] = f(row)
答案 0 :(得分:0)
尝试一下,让我知道它是否有效
def f(x):
frequencies = pd.Series(data=[y for y in x if np.isnan(y)==False]).value_counts()
a,b = frequencies['HY'],frequencies['IG']
return 'HY' if a>=b else 'IG'
df['result'] = df.columns[['D','F','M','P']].apply(lambda x: f(x))
我现在无法弄清楚上述方法为何行不通
def f(x):
frequencies = pd.Series(data=[y for y in x if pd.isnull(y)==False]).value_counts()
a,b,c = 0,0,0
if 'IG' in frequencies:
b = frequencies['IG']
if 'HY' in frequencies:
a = frequencies['HY']
if 'PFA' in frequencies:
c = frequencies['PFA']
if c>=1:
return 'PFA'
else:
return 'HY' if a>=b else 'IG'
for i,row in df_Start.iterrows():
df_Start.loc[i,'result'] = f(row)
新的应该可以。