我正在尝试从Pandas的庞大数据集中删除一些行。我决定使用iterrows()函数搜索要删除的索引(因为我知道在迭代时删除是个坏主意)。 现在看起来像这样:
list_to_delete = []
rows_to_delete = {}
for index, row in train.iterrows():
if <some conditions>:
list_to_delete.append(int(index))
rows_to_delete[int(index)] = row
train = train.drop([train.index[i] for i in list_to_delete])
这给了我这样的错误:
Traceback (most recent call last):
File "C:/Users/patka/PycharmProjects/PARSER/getStatistics.py", line 115, in <module>
train = train.drop([train.index[i] for i in list_to_delete])
File "C:/Users/patka/PycharmProjects/PARSER/getStatistics.py", line 115, in <listcomp>
train = train.drop([train.index[i] for i in list_to_delete])
File "C:\Users\patka\PycharmProjects\PARSER\venv\lib\site-packages\pandas\core\indexes\base.py", line 3958, in __getitem__
return getitem(key)
IndexError: index 25378 is out of bounds for axis 0 with size 25378
怎么可能?
在此之前,我创建了该数据集的副本,并尝试在遍历原始副本(使用inplace = True)的同时从该副本中删除选定的行。不幸的是,有一个错误消息说NoneType对象没有属性'drop'。
非常感谢您的帮助。 我的示例行如下所示:
resolution Done
priority Major
created 2000-07-04T13:13:52.000+0200
status Resolved
Team XBee
changelog {'Team" : {'from':...