Question

在我下面的Python代码中，为什么我不能通过第二种方法调用该函数？如果我想更改代码以便我可以使用第二种方法调用该函数，我应该如何更改我的函数？

import numpy as np
import pandas as pd
sales = {'Feb': [200, None, None],'Mar': [140, 215, 95]}
df = pd.DataFrame.from_dict(sales)

def impute_age(cols):
    Age = cols[0]
    Pclass = cols[1]

    if pd.isnull(Age):
        if Pclass == 215:
            return 37
        elif Pclass == 2:
            return 29
        else:
            return 24
    else:
        return Age


#method 1
df['z'] = df[['Feb','Mar']].apply(impute_age, axis=1)

print(df)
#method 2
df['zz'] = impute_age(df[['Feb','Mar']])

type(df[['Feb','Mar']].head(1))
Out[7]: pandas.core.frame.DataFrame

type(df[['Feb','Mar']].head(1).values[0])
Out[8]: numpy.ndarray

下面Bharath Shetty的评论帮助了我。该函数期望数组作为输入而不是数据帧。无法使用df [0] ...访问数据框的列，这会产生如下错误

df[0]
Traceback (most recent call last):

  File "<ipython-input-9-9ae93f22b889>", line 1, in <module>
    df[0]

  File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\frame.py", line 2062, in __getitem__
    return self._getitem_column(key)

  File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\frame.py", line 2069, in _getitem_column
    return self._get_item_cache(key)

  File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\generic.py", line 1534, in _get_item_cache
    values = self._data.get(item)

  File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)

  File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\indexes\base.py", line 2395, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)

  File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)

  File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)

KeyError: 0

Answer 1

在列之间应用函数与将数据帧传递给该函数不同。 Apply是一个简化的for循环。它每次向一行发送一行到函数收集数据然后返回一个序列。

要使你的第二个方法在函数内使用iterrows，请在空列表的帮助下收集数据，即

def impute_age2(cols):
    k = []
    for _,i in cols.iterrows():
        Age = i[0]
        Pclass = i[1]
        if pd.isnull(Age):
            if Pclass == 215:
                k.append(37)
            elif Pclass == 2:
                k.append(29)
            else:
                k.append(24)
        else:
            k.append(Age)
    return k

df['zz'] = impute_age(df[['Feb','Mar']])

输出：

     Feb  Mar     zz   
0  200.0  140  200.0
1    NaN  215   37.0 
2    NaN   95   24.0

python - 调用函数

1 个答案: