Smogn 错误:位置索引器越界

时间:2021-02-23 04:33:05

标签: python-3.x oversampling

我正在尝试在我的数据集上使用合成少数过采样技术进行高斯噪声回归,但我得到了一个我不理解的 IndexError

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_list_axis(self, key, axis)
   1473         try:
-> 1474             return self.obj._take_with_is_copy(key, axis=axis)
   1475         except IndexError as err:

~\Anaconda3\lib\site-packages\pandas\core\generic.py in _take_with_is_copy(self, indices, axis)
   3598         """
-> 3599         result = self.take(indices=indices, axis=axis)
   3600         # Maybe set copy if we didn't actually change the index.

~\Anaconda3\lib\site-packages\pandas\core\generic.py in take(self, indices, axis, is_copy, **kwargs)
   3585         new_data = self._mgr.take(
-> 3586             indices, axis=self._get_block_manager_axis(axis), verify=True
   3587         )

~\Anaconda3\lib\site-packages\pandas\core\internals\managers.py in take(self, indexer, axis, verify, convert)
   1466         if convert:
-> 1467             indexer = maybe_convert_indices(indexer, n)
   1468 

~\Anaconda3\lib\site-packages\pandas\core\indexers.py in maybe_convert_indices(indices, n)
    264     if mask.any():
--> 265         raise IndexError("indices are out-of-bounds")
    266     return indices

IndexError: indices are out-of-bounds

The above exception was the direct cause of the following exception:

IndexError                                Traceback (most recent call last)
<ipython-input-7-e354cd403458> in <module>
      6 dummy_df_smogn = smogn.smoter(
      7     data = dummy_df,
----> 8     y = 'estimated_delivered_days'
      9 )
     10 

~\Anaconda3\lib\site-packages\smogn\smoter.py in smoter(data, y, k, pert, samp_method, under_samp, drop_na_col, drop_na_row, replace, rel_thres, rel_method, rel_xtrm_type, rel_coef, rel_ctrl_pts_rg)
    243                 perc = s_perc[i],
    244                 pert = pert,
--> 245                 k = k
    246             )
    247 

~\Anaconda3\lib\site-packages\smogn\over_sampling.py in over_sampling(data, index, perc, pert, k)
     69 
     70     ## subset original dataframe by bump classification index
---> 71     data = data.iloc[index]
     72 
     73     ## store dimensions of data subset

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
    893 
    894             maybe_callable = com.apply_if_callable(key, self.obj)
--> 895             return self._getitem_axis(maybe_callable, axis=axis)
    896 
    897     def _is_scalar_access(self, key: Tuple):

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
   1490         # a list of integers
   1491         elif is_list_like_indexer(key):
-> 1492             return self._get_list_axis(key, axis=axis)
   1493 
   1494         # a single integer

~\Anaconda3\lib\site-packages\pandas\core\indexing.py in _get_list_axis(self, key, axis)
   1475         except IndexError as err:
   1476             # re-raise with different error message
-> 1477             raise IndexError("positional indexers are out-of-bounds") from err
   1478 
   1479     def _getitem_axis(self, key, axis: int):

IndexError: positional indexers are out-of-bounds

我正在使用 smogn 包,这是我正在运行的代码:

#Select columns for model
x = dummy_df[dummy_df.columns.difference(['estimated_delivered_days'])]
y = dummy_df.estimated_delivered_days

## conduct smogn
dummy_df_smogn = smogn.smoter(
    data = dummy_df, 
    y = 'estimated_delivered_days'
)

其中 dummy_df 是形状 (112819, 30) 并具有 dtypes:

payment_sequential               float64
payment_type                       int32
payment_installments             float64
payment_value                    float64
customer_unique_id                 int32
customer_zip_code_prefix           int64
customer_city                      int32
customer_state                     int32
order_item_id                    float64
product_id                         int32
seller_id                          int32
price                            float64
freight_value                    float64
product_weight_g                 float64
product_length_cm                float64
product_height_cm                float64
product_width_cm                 float64
product_category_name_english      int32
seller_zip_code_prefix           float64
seller_city                        int32
seller_state                       int32
order_delivered_month              int64
estimated_delivery_month           int64
estimated_delivered_days           int64
shipping_limit_carrier_days        int64
late_to_customer                   int32
late_to_carrier                    int32
major_holidays                     int64
weight_by_volume                 float64
seller_customer_dist_miles       float64

需要一些帮助来理解错误以及如何解决它。

0 个答案:

没有答案