我想使用两个功能制作箱线图,即“ date_submitted”和“ final_result”。 我已经检查了文档。我不明白为什么会出现此错误。
我使用了以下代码-
import pandas as pd
import numpy as np
df = pd.read_csv('/home/user/Documents/MOOC dataset cleaned/assessments.csv')
df.boxplot(column ='date_submitted', by='final_result')
这是我的数据集的描述-
date_submitted date_registration date_unregistration sum_click \
count 28785.000000 28785.000000 28785.000000 28785.000000
mean 26.414139 -69.139552 321.657426 2.660066
std 15.890933 49.305239 188.935462 5.177789
min -11.000000 -322.000000 -317.000000 1.000000
25% 18.000000 -100.000000 130.000000 1.000000
50% 24.000000 -56.000000 445.000000 1.000000
75% 30.000000 -29.000000 445.000000 3.000000
max 241.000000 124.000000 445.000000 511.000000
num_of_prev_attempts age_band region highest_education \
count 28785.000000 28785.000000 25559.000000 28785.000000
mean 0.121278 1.693660 5.041981 1.280111
std 0.420666 0.474206 3.689341 0.769604
min 0.000000 0.000000 0.000000 0.000000
25% 0.000000 1.000000 2.000000 1.000000
50% 0.000000 2.000000 5.000000 1.000000
75% 0.000000 2.000000 9.000000 2.000000
max 6.000000 2.000000 11.000000 4.000000
studied_credits score final_result
count 28785.000000 28785.000000 28785.000000
mean 78.691506 75.453431 1.029703
std 40.617665 19.968919 0.884043
min 30.000000 0.000000 0.000000
25% 60.000000 68.000000 0.000000
50% 60.000000 82.000000 1.000000
75% 120.000000 87.000000 2.000000
max 655.000000 100.000000 2.000000
错误回溯-
Traceback (most recent call last):
File "/home/user/Documents/outliers.py", line 6, in <module>
df.boxplot(column ='date_submitted', by='final_result')
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 5516, in boxplot
**kwds)
File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 2689, in boxplot
return_type=return_type)
File "/usr/lib/python2.7/dist-packages/pandas/tools/plotting.py", line 3077, in _grouped_plot_by_column
grouped = data.groupby(by)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 3436, in groupby
sort=sort, group_keys=group_keys, squeeze=squeeze)
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1311, in groupby
return klass(obj, by, **kwds)
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 418, in __init__
level=level, sort=sort)
File "/usr/lib/python2.7/dist-packages/pandas/core/groupby.py", line 2264, in _get_grouper
in_axis, name, gpr = True, gpr, obj[gpr]
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, in get
loc = self.items.get_loc(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc (pandas/index.c:3979)
File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas/index.c:3843)
File "pandas/hashtable.pyx", line 668, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
File "pandas/hashtable.pyx", line 676, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: 'final_result'
[Finished in 0.292s]
我不明白为什么会出现此错误。