使用pandas时,为什么会出现AttributeError?

时间:2018-01-01 18:20:40

标签: python pandas dataframe apply attributeerror

如何根据条件将NaN值转换为分类值。我在尝试转换Nan值时遇到错误。

category           gender     sub-category    title

health&beauty      NaN         makeup         lipbalm

health&beauty      women       makeup         lipstick

NaN                NaN         NaN            lipgloss

我的DataFrame看起来像这样。我将性别中的NaN值转换为分类值的功能类似于

def impute_gender(cols):
    category=cols[0]
    sub_category=cols[2]
    gender=cols[1]
    title=cols[3]
    if title.str.contains('Lip') and gender.isnull==True:
        return 'women'
df[['category','gender','sub_category','title']].apply(impute_gender,axis=1)

如果我运行代码我会收到错误

----> 7     if title.str.contains('Lip') and gender.isnull()==True:
      8         print(gender)
      9 

AttributeError: ("'str' object has no attribute 'str'", 'occurred at index category')

完整数据集 - https://github.com/lakshmipriya04/py-sample

3 个答案:

答案 0 :(得分:13)

有些事情需要注意 -

  1. 如果您只使用两列,则调用apply超过4列是浪费
  2. 一般来说,拨打apply是浪费的,因为它很慢并且没有向你提供任何矢量化的好处
  3. 在应用中,您正在处理标量,因此您不像使用.str对象那样使用pd.Series访问器。 title.contains就足够了。或者更热情地,"lip" in title
  4. gender.isnull完全错误,gender是标量,没有isnull属性
  5. 选项1
    np.where

    m = df.gender.isnull() & df.title.str.contains('lip')
    df['gender'] = np.where(m, 'women', df.gender)
    
    df
            category gender sub-category     title
    0  health&beauty  women       makeup   lipbalm
    1  health&beauty  women       makeup  lipstick
    2            NaN  women          NaN  lipgloss
    

    这不仅快,而且更简单。如果您担心区分大小写,可以使contains检查不区分大小写 -

    m = df.gender.isnull() & df.title.str.contains('lip', flags=re.IGNORECASE)
    

    选项2
    另一种方法是使用pd.Series.mask / pd.Series.where -

    df['gender'] = df.gender.mask(m, 'women')
    

    或者,

    df['gender'] = df.gender.where(~m, 'women')
    

    df
            category gender sub-category     title
    0  health&beauty  women       makeup   lipbalm
    1  health&beauty  women       makeup  lipstick
    2            NaN  women          NaN  lipgloss
    

    mask根据提供的掩码隐式将新值应用于列。

答案 1 :(得分:6)

或者只是使用loc作为@ COLDSPEED的回答

的选项3
cond = (df['gender'].isnull()) & (df['title'].str.contains('lip'))
df.loc[cond, 'gender'] = 'women'


    category        gender  sub-category    title
0   health&beauty   women   makeup          lipbalm
1   health&beauty   women   makeup          lipstick
2   NaN             women       NaN         lipgloss

答案 2 :(得分:3)

如果我们使用NaN值,fillna可以是方法之一: - )

df.gender=df.gender.fillna(df.title.str.contains('lip').replace(True,'women'))
df
Out[63]: 
        category gender sub-category     title
0  health&beauty  women       makeup   lipbalm
1  health&beauty  women       makeup  lipstick
2            NaN  women          NaN  lipgloss