我一直在研究Python中的算法,该算法通过excel在Pandas中解析数据,并尝试删除任何缺少值的数据,基本上是其中任一列中带有 NaN 的任何行,大写。
以下是我的代码:
import numpy as np
import pandas as pd
import math as math
import shutil as shutil
from random import seed
from random import random
randNum = int(random() * 100)
shutil.copy('unsorted/daily/fed_debt_data.csv', 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv')
debt_copy = 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv'
debt_copy_read = pd.read_csv(debt_copy, names = ["Date", "Debt"])
debt_copy_read.head()
for key, value in debt_copy_read.iteritems():
debt_copy_read.drop(key, axis = 0)
预期结果是,我删除了包含 NaN 值的列的任何行。实际结果是我在运行代码时不断出错:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-20-3083af5a3e02> in <module>
1 for key, value in debt_copy_read.iteritems():
----> 2 debt_copy_read.drop(key, axis = 0)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in drop(self, labels, axis, index, columns, level, inplace, errors)
3938 index=index, columns=columns,
3939 level=level, inplace=inplace,
-> 3940 errors=errors)
3941
3942 @rewrite_axis_style_signature('mapper', [('copy', True),
~\Anaconda3\lib\site-packages\pandas\core\generic.py in drop(self, labels, axis, index, columns, level, inplace, errors)
3778 for axis, labels in axes.items():
3779 if labels is not None:
-> 3780 obj = obj._drop_axis(labels, axis, level=level, errors=errors)
3781
3782 if inplace:
~\Anaconda3\lib\site-packages\pandas\core\generic.py in _drop_axis(self, labels, axis, level, errors)
3810 new_axis = axis.drop(labels, level=level, errors=errors)
3811 else:
-> 3812 new_axis = axis.drop(labels, errors=errors)
3813 result = self.reindex(**{axis_name: new_axis})
3814
~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in drop(self, labels, errors)
4963 if errors != 'ignore':
4964 raise KeyError(
-> 4965 '{} not found in axis'.format(labels[mask]))
4966 indexer = indexer[~mask]
4967 return self.delete(indexer)
KeyError: "['Date'] not found in axis"
我试图遍历有关美国债务的数据,一列为“ Date”变量,另一列为“ Debt”。任何关于出了什么问题/修复的建议都值得赞赏。数据组织如下:
Date,Debt
2010-02-01T14:30:00Z,12349463585067.40
2010-02-03T14:30:00Z,12354041054846.90
2010-02-05T14:30:00Z,12345510656150.00
2010-02-09T14:30:00Z,12349467132738.40
2010-02-11T14:30:00Z,12349324464284.20
2010-02-16T14:30:00Z,12384358013736.30
2010-02-17T14:30:00Z,12386495535882.20
2010-02-18T14:30:00Z,12401448666808.30
答案 0 :(得分:1)
您无需遍历行即可删除具有NAN值的行。 您可以直接调用pandas.DataFrame的dropna()方法。有关更多详细信息,请参考以下网址: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
import numpy as np
import pandas as pd
import math as math
import shutil as shutil
from random import seed
from random import random
randNum = int(random() * 100)
shutil.copy('unsorted/daily/fed_debt_data.csv', 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv')
debt_copy = 'unsorted/daily/fed_debt_data' + str(randNum) + '.csv'
debt_copy_read = pd.read_csv(debt_copy, names = ["Date", "Debt"])
debt_copy_read.head()
debt_copy_read.dropna()
答案 1 :(得分:0)
您可以尝试:
readme.md
删除其中包含nan的行
如果熊猫将您的债务栏重新格式化,则可以使用以下方式重新格式化:
debt_copy.dropna()