我有一个包含某些列的数据框,其中之一是排名,另一列是服务年限。基于这些,我想创建一个新列“ Life Cover”。我已经为此创建了此功能。
def LifeCover(row):
if row['Years of Service']>5:
val = 8
elif row['Years of Service']>2 and row['Position'] in ['Associate', 'Director', 'Director of Facilities Management', 'Director of Promise', 'Director, Head of Facilities Management']:
val = 8
elif row['Years of Service']>2 and row['Position'] not in ['Associate', 'Director', 'Director of Facilities Management', 'Director of Promise', 'Director, Head of Facilities Management']:
val = 7
else:
val = 3
return val
df['Potential Life Cover Level'] = df.apply(LifeCover, axis=1)
这行得通,但是我不喜欢拥有如此庞大的职位清单,事实证明,这个清单可能也需要增加,因此也不切实际。
我需要包含/排除任何包含“协理”,“董事或合伙人”一词的职位。
我设法进行了这样的过滤:
target = ['Associate', 'Director', 'Partner']
dfhigh = df[df['Position'].apply(lambda sentence: any(word in sentence for word in target))]
dflow = df[~df['Position'].apply(lambda sentence: any(word in sentence for word in target))]
所以我得到一个数据框,位置较高,一个位置较低。
然后我尝试将其包括在我的函数中:
def LifeCover2(row):
if row['Years of Service']>5:
val = 8
elif row['Years of Service']>2 and row['Position'] in dfhigh['Position']:
val = 8
elif row['Years of Service']>2 and row['Position'] in dflow['Position']:
val = 7
else:
val = 3
return val
但是由于某些原因,它仅返回值8或3。
我也尝试过:
def LifeCover2(row):
if row['Years of Service']>5:
val = 8
elif row['Years of Service']>2 and row['Position'].str.contains('Associate|Director|Partner'):
val = 8
elif row['Years of Service']>2 and (~row['Position'].str.contains('Associate|Director|Partner')):
val = 7
else:
val = 3
return val
返回AttributeError :(“ str对象没有属性'str'“,“发生在索引69”)
答案 0 :(得分:0)
str.contains
方法是向量化的字符串操作(see here)。这意味着它是用于熊猫系列而不是字符串类型的方法。当您使用df.apply
时,pandas会尝试将str.contains
用于您选择的列的每个元素,而不是在“系列”级别进行。
我建议采用以下方法:
df['LifeCover2'] = 3
df['LifeCover2'] = np.where(df['Years of Service']>5, 8, df['LifeCover2'])
df['LifeCover2'] = np.where((df['Years of Service']>2) &
(df['Position'].str.contains('Associate|Director|Partner')), 8, df['LifeCover2'])
df['LifeCover2'] = np.where((df['Years of Service']>2) &
(~df['Position'].str.contains('Associate|Director|Partner')), 7, df['LifeCover2'])