当我尝试执行下面的代码时,我得到remove_na_scores
回溯。我尝试在apply()
函数中的import pandas as pd
import pprint
shows = pd.read_csv('/Users/WilliamStevens/Downloads/netflix_shows.csv')
pprint.pprint(shows.head())
shows.info()
shows_df = shows.groupby(['ratingDescription']).mean()
print(shows_df)
missing_user_scores = shows[shows['user rating score'].isnull()]
mean_scores = shows.groupby(['ratingDescription'])['user rating score'].mean()
def remove_na_scores(row):
if pd.isnull(row['user rating score']):
return mean_scores[row['rating']]
else:
return row['user rating score']
shows['user rating score'] = shows.apply(remove_na_scores)
print(shows['user rating score'])
之后更改轴,但是没有任何工作。
Traceback (most recent call last):
File "netflix.py", line 28, in <module>
shows['user rating score'] = shows.apply(remove_na_scores)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 4163, in apply
return self._apply_standard(f, axis, reduce=reduce)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/frame.py", line 4259, in _apply_standard
results[i] = func(v)
File "netflix.py", line 23, in remove_na_scores
if pd.isnull(row['user rating score']):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/core/series.py", line 601, in __getitem__
result = self.index.get_value(self, key)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3567)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3250)
File "pandas/index.pyx", line 163, in pandas.index.IndexEngine.get_loc (pandas/index.c:4373)
KeyError: ('user rating score', 'occurred at index title')
完整的追溯如下:
{{1}}
答案 0 :(得分:0)
该行:
if pd.isnull(row['user rating score']):
发生错误的位置。
发生这种情况是因为remove_na_scores
沿错误的轴应用。将, axis=1
添加到shows.apply
可以解决您发布跟踪的问题:
>>>df = pd.DataFrame(np.random.rand(4,2), columns = ['title', 'user rating score'])
>>>df.apply(lambda x: x['user rating score'])
...
KeyError: ('user rating score', 'occurred at index title')
>>>df.apply(lambda x: x['user rating score'], axis=1)
0 0.083195
1 0.243666
2 0.572457
3 0.885327
dtype: float64
由于您提到指定轴不起作用,我想您在此之后会收到另一个错误。如果您描述新问题,我可以编辑我的帖子,但这应该解决所描述的问题。