我有一个数据集,其中列中有3列名为CLASS
,DURATION
和GENDER
。
import pandas as pd
data = pd.read_csv('dataset.csv')
CLASS = data['CLASS']
DURATION = data['DURATION']
GENDER = data['GENDER']
CLASS
包含5种类型的条目 - blank, 1, 2, 3, 4
; DURATION
包含-1
(表示某些语义值)或某个正整数; GENDER
包含M
或F
。我可以按CLASS
选择GENDER
中的条目,如下所示:
CLASS[GENDER=='M']
但我无法在OCCUP_CLASS
中选择-1
的持续时间,如下所示:
CLASS[DURATION=='-1']
为什么?这是我得到的错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-56-604aed5ebca4> in <module>()
----> 1 CLASS[DURATION=='-1']
c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
621 key = com._apply_if_callable(key, self)
622 try:
--> 623 result = self.index.get_value(self, key)
624
625 if not is_scalar(result):
c:\users\h473\appdata\local\programs\python\python35\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
2558 try:
2559 return self._engine.get_value(s, k,
-> 2560 tz=getattr(series.dtype, 'tz', None))
2561 except KeyError as e1:
2562 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: False
答案 0 :(得分:0)
我无法复制,但你可以尝试
import pandas as pd
data = pd.read_csv('dataset.csv')
CLASS = data['CLASS']
DURATION = data['DURATION']
GENDER = data['GENDER']
# fill the nan value
DURATION.fillna(0,inplace=True)
# using astype convert the value to int then compare
CLASS[DURATION].astype(int)>0
答案 1 :(得分:0)
也许最好不要将它们拆分为系列开始,而是在Dataframe上尝试这个:
import pandas as pd
data = pd.read_csv('dataset.csv')
data.loc[data['GENDER'] == 'M', 'CLASS']
data.loc[data['DURATION'] == -1, 'CLASS']