在我下面的Python代码中,为什么我不能通过第二种方法调用该函数?如果我想更改代码以便我可以使用第二种方法调用该函数,我应该如何更改我的函数?
import numpy as np
import pandas as pd
sales = {'Feb': [200, None, None],'Mar': [140, 215, 95]}
df = pd.DataFrame.from_dict(sales)
def impute_age(cols):
Age = cols[0]
Pclass = cols[1]
if pd.isnull(Age):
if Pclass == 215:
return 37
elif Pclass == 2:
return 29
else:
return 24
else:
return Age
#method 1
df['z'] = df[['Feb','Mar']].apply(impute_age, axis=1)
print(df)
#method 2
df['zz'] = impute_age(df[['Feb','Mar']])
type(df[['Feb','Mar']].head(1))
Out[7]: pandas.core.frame.DataFrame
type(df[['Feb','Mar']].head(1).values[0])
Out[8]: numpy.ndarray
下面Bharath Shetty的评论帮助了我。该函数期望数组作为输入而不是数据帧。无法使用df [0] ...访问数据框的列,这会产生如下错误
df[0]
Traceback (most recent call last):
File "<ipython-input-9-9ae93f22b889>", line 1, in <module>
df[0]
File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\frame.py", line 2062, in __getitem__
return self._getitem_column(key)
File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\frame.py", line 2069, in _getitem_column
return self._get_item_cache(key)
File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\generic.py", line 1534, in _get_item_cache
values = self._data.get(item)
File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\internals.py", line 3590, in get
loc = self.items.get_loc(item)
File "C:\Users\nikhi\AppData\Local\conda\conda\envs\tensorflowspyder\lib\site-packages\pandas\core\indexes\base.py", line 2395, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5239)
File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5085)
File "pandas\_libs\hashtable_class_helper.pxi", line 1207, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20405)
File "pandas\_libs\hashtable_class_helper.pxi", line 1215, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20359)
KeyError: 0
答案 0 :(得分:2)
在列之间应用函数与将数据帧传递给该函数不同。 Apply
是一个简化的for循环。它每次向一行发送一行到函数收集数据然后返回一个序列。
要使你的第二个方法在函数内使用iterrows
,请在空列表的帮助下收集数据,即
def impute_age2(cols):
k = []
for _,i in cols.iterrows():
Age = i[0]
Pclass = i[1]
if pd.isnull(Age):
if Pclass == 215:
k.append(37)
elif Pclass == 2:
k.append(29)
else:
k.append(24)
else:
k.append(Age)
return k
df['zz'] = impute_age(df[['Feb','Mar']])
输出:
Feb Mar zz 0 200.0 140 200.0 1 NaN 215 37.0 2 NaN 95 24.0