我有一个很大的数据集,其中包含多个列中的许多NaN值。
我尝试了以下代码,但它没有从数据集中删除Nan值
df = pd.read_excel('sec3_data.xlsx')
df.dropna(subset=["Deviation from Partisanship"])
df['Deviation from Partisanship'].unique()
输出:
array([nan, 'Vote for opposing party', 'Vote for own party'], dtype=object)
它清楚地表明仍然有一些nan值可用。如何删除它们?
答案 0 :(得分:2)
您需要重新分配到新的数据框:
df2 = df.dropna(subset=["Deviation from Partisanship"])
或执行放置inplace
:
df.dropna(subset=["Deviation from Partisanship"], inplace=True)
您可以在以下文档中找到更多信息:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html
答案 1 :(得分:1)
您需要将其写为
df = df.dropna(subset=["Deviation from Partisanship"])
或
df.dropna(subset=["Deviation from Partisanship"], inplace=True)
答案 2 :(得分:0)
# Method 1
df = pd.read_excel('sec3_data.xlsx')
df.dropna(subset=["Deviation from Partisanship"], inplace=True)
df['Deviation from Partisanship'].unique()
# Method 2
df = pd.read_excel('sec3_data.xlsx')
df2 = df.dropna(subset=["Deviation from Partisanship"])
df2['Deviation from Partisanship'].unique()