我有一个包含消费者电子邮件数据的数据框-新鲜的和重复的联系电子邮件。我需要根据某些条件在此数据中找到异常值:
count1 > 1
和count 2 > 1
count1 > 1
和count 2 < 1
我检查了python中的函数定义和语法,并相应地定义了离群分类的函数。
def outlier():
for index, row in df.iterrows():
if([row][count1] > 1 and [row][count2] > 1):
if(df[row][Journey] == df[row][journey_lag]):
df[row][outlier] = Same_Property/Date/Agent/Journey
else:
df[row][outlier] = Same_Property/Date/Agent-Different Journey
elif([row][count1] > 1 and [row][count2] == 1):
if(df[row][Journey] == df[row][journey_lag]):
df[row][outlier] = Same_Property/Date-Different_Agent/Journey
else:
df[row][outlier]=Same_Property/Date_Different_Agent/Journey
return df
我希望使用如下数据框执行此功能:
df.outlier
df.apply(outlier)
错误:无法获得要求的结果
答案 0 :(得分:1)
当您在.apply(my_function)
对象上使用DataFrame
时,熊猫会期望一个1参数的函数,如果axis=0
,则该参数将是DataFrame的一列,如果axis=1
如果是def outlier(row):
if row['count1'] > 1 and row['count2'] > 1:
if row['Journey'] == row['journey_lag']:
return 'Same_Property/Date/Agent/Journey'
else:
return 'Same_Property/Date/Agent/Different_Journey'
elif row['count1'] > 1 and row['count2'] == 1:
if row['Journey'] == row['journey_lag']:
return 'Same_Property/Date/Different_Agent/Journey'
else:
return 'Same_Property/Date/Different_Agent/Different_Journey'
df['outlier'] = df.apply(outlier, axis=1)
,则为DataFrame。
您需要这样的东西:
Error: Network error