从未排序索引的熊猫数据框中删除行

时间:2020-08-28 19:46:48

标签: python pandas numpy

这是我的数据的样子:

print(len(y_train),len(index_1))
index_1 = pd.DataFrame(data=index_1)
print("y_train: ")
print(y_train)
print("index_1: ")
print(index_1)

输出:

1348 555
y_train: 
1677    1
1519    0
1114    0
690     1
1012    1
       ..
1893    1
1844    0
1027    1
1649    1
1789    1
Name: Team 1 Win, Length: 1348, dtype: int64
index_1: 
        0
0       0
1       2
2       6
3       7
4       8
..    ...
550  1335
551  1341
552  1342
553  1344
554  1346

我想从熊猫数据框(y_train)中删除许多行(index_1)。因此index_1 df中的值是我要删除的行。问题在于数据帧的顺序不正确,因此当index_1的第一项为0时,我希望它删除y_train中的第一行(即索引1677),而不是索引为0的行。 这是我的尝试:

y_train_short = y_train.drop(index_1)

这就是我得到的:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-57-49f2cce7bac0> in <module>
     22 print(index_1)
     23 print(index_1)
---> 24 y_train_short = y_train.drop(index_1)
     25 
     26 

~/miniconda3/lib/python3.7/site-packages/pandas/core/series.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   4137             level=level,
   4138             inplace=inplace,
-> 4139             errors=errors,
   4140         )
   4141 

~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
   3934         for axis, labels in axes.items():
   3935             if labels is not None:
-> 3936                 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
   3937 
   3938         if inplace:

~/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py in _drop_axis(self, labels, axis, level, errors)
   3968                 new_axis = axis.drop(labels, level=level, errors=errors)
   3969             else:
-> 3970                 new_axis = axis.drop(labels, errors=errors)
   3971             result = self.reindex(**{axis_name: new_axis})
   3972 

~/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py in drop(self, labels, errors)
   5016         if mask.any():
   5017             if errors != "ignore":
-> 5018                 raise KeyError(f"{labels[mask]} not found in axis")
   5019             indexer = indexer[~mask]
   5020         return self.delete(indexer)

KeyError: '[0] not found in axis'

独立于y_train中不存在索引0的事实,我想象如果它存在,它将不会执行我想要的操作。那么如何从此数据框中删除正确的行?

2 个答案:

答案 0 :(得分:1)

请注意,y_train.iloc[index_1[0]] y_train 中检索行。 占据指示的整数位置。

运行y_train.iloc[index_1[0]].index时,您将获得 这些行中的索引

因此请删除这些行,您可以运行:

y_train.drop(y_train.iloc[index_1[0]].index, inplace=True)

答案 1 :(得分:0)

您可以在索引上使用isin

# set index to start from 0
y_train = y_train.reset_index(drop=True)

# do simple filter
y_train = y_train[~y_train.index.isin(index_1[0])].copy()