Question

我有一张下表：

样本数据：

e_id e_ctry e_grp_id e_loc_nbr e_loc_id e_sal
=============================================
111  03     65       889        03      10000
131  01     67       009        09      8000
152  02     12       545        09      17000
155  04     55       778        09      33000
115  04     55       778        09      33000
156  04     55       778        09      33000 
177  03     65       889        03      14000
122  03     65       889        03      14000
141  03     65       889        03      17000
171  03     65       889        03      14000

尝试以下代码：

d_tbl = self.emp_d[['e_id','e_ctry','e_grp_id','e_loc_nbr','e_loc_id','e_sal']].drop_duplicates()


def e_c_rslt(self):
    e_c_data = self.d_tbl[(self.d_tbl['e_loc_id']==1) ][['e_id','e_ctry','e_grp_id','e_loc_nbr','e_sal']]
    e_c_grpd = e_c_data.groupby([e_id','e_grp_id','e_ctry']).e_sal.nunique().reset_index() 
    rslt_ac9b=e_c_grpd[e_c_grpd.e_sal>15000]

但继续收到以下错误消息：

e_c_grpd = e_c_data.groupby([e_id','e_grp_id','e_ctry']).e_sal.nunique().reset_index() 

File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 2866, in nunique
    res = out if ids[0] != -1 else out[1:]
IndexError: index 0 is out of bounds for axis 0 with size 0

没有弄错我在做什么？

预期的O / P：

e_id e_ctry e_grp_id  e_sal
===========================
111  03     65       10000
177  03     65       14000
122  03     65       14000
141  03     65       17000
171  03     65       14000

要求是收集['e_id', 'e_ctry', 'e_grp_id']，如果e_sal>15000，并且对于同一e_sal有不同的'e_ctry', 'e_grp_id'的地方。

Update_1：

打印print(e_c_data)后获得：

Empty DataFrame
Columns: [e_id,e_ctry,e_grp_id,e_loc_nbr,e_sal]
Index: []

Answer 1

该错误消息告诉您正在索引到的对象的大小为0-换句话说，它为空。为什么会是空的？好吧，您可以放入print来找出它发生的位置，或者我们可以看看您的框架：

e_id e_ctry e_grp_id e_loc_nbr e_loc_id e_sal
=============================================
111  03     65       889        03      10000
131  01     67       009        09      8000

假设这是代表性的，请注意，您的e_loc_id列以0开头。但是，如果它是整数，则不会：那些没有前导零出现。这意味着您必须具有字符串：

In [13]: df = pd.DataFrame({"A": [1,2], "B": ['01', '02']})

In [14]: df
Out[14]: 
   A   B
0  1  01
1  2  02

In [15]: df.dtypes
Out[15]: 
A     int64
B    object
dtype: object

但是，如果您的e_loc_ds是字符串，则此比较将永远不会成功：

self.d_tbl['e_loc_id']==1

所以e_c_data为空。

无法解决“ IndexError：索引0超出轴0尺寸0的范围”

1 个答案: