当我在Jupyter Notebook中运行以下代码时:
columns = ['nkill', 'nkillus', 'nkillter','nwound', 'nwoundus', 'nwoundte', 'propvalue', 'nperps', 'nperpcap', 'iyear', 'imonth', 'iday']
for col in columns:
# needed for any missing values set to '-99'
df[col] = [np.nan if (x < 0) else x for x in
df[col].tolist()]
# calculate the mean of the column
column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
mean = round(np.mean(column_temp))
# then apply the mean to all NaNs
df[col].fillna(mean, inplace=True)
我收到以下错误:
AttributeError Traceback
(most recent call last)
<ipython-input-56-f8a0a0f314e6> in <module>()
3 for col in columns:
4 # needed for any missing values set to '-99'
----> 5 df[col] = [np.nan if (x < 0) else x for x in df[col].tolist()]
6 # calculate the mean of the column
7 column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
4374 if self._info_axis._can_hold_identifiers_and_holds_name(name):
4375 return self[name]
-> 4376 return object.__getattribute__(self, name)
4377
4378 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'tolist'
当我在Pycharm中运行该代码时,它可以正常工作,而我的所有研究都使我得出结论:它应该很好。我想念什么吗?
我在下面创建了一个最小,完整和可验证的示例:
import numpy as np
import pandas as pd
import os
import math
# get the path to the current working directory
cwd = os.getcwd()
# then add the name of the Excel file, including its extension to get its relative path
# Note: make sure the Excel file is stored inside the cwd
file_path = cwd + "/data.xlsx"
# Copy the database to file
df = pd.read_excel(file_path)
columns = ['nkill', 'nkillus', 'nkillter', 'nwound', 'nwoundus', 'nwoundte', 'propvalue', 'nperps', 'nperpcap', 'iyear', 'imonth', 'iday']
for col in columns:
# needed for any missing values set to '-99'
df[col] = [np.nan if (x < 0) else x for x in df[col].tolist()]
# calculate the mean of the column
column_temp = [0 if math.isnan(x) else x for x in df[col].tolist()]
mean = round(np.mean(column_temp))
# then apply the mean to all NaNs
df[col].fillna(mean, inplace=True)
答案 0 :(得分:1)
您有一个XY Problem。您已经在评论中描述了您要实现的目标,但是您的方法不适用于熊猫。
for
循环和list
对于Pandas,您应该避免显式的for
循环或转换为Python list
。 Pandas建立在NumPy数组的基础上,该数组支持矢量化列式操作。
因此,让我们看一下如何重写:
for col in columns:
# values less than 0 set to NaN
# calculate the mean of the column with 0 for NaN
# then apply the mean to all NaNs
您现在可以使用Pandas方法来实现上述目标。
apply
+ pd.to_numeric
+ mask
+ fillna
您可以定义函数mean_update
并使用pd.DataFrame.apply
将其应用于每个系列:
df = pd.DataFrame({'A': [1, -2, 3, np.nan],
'B': ['hello', 4, 5, np.nan],
'C': [-1.5, 3, np.nan, np.nan]})
def mean_update(s):
s_num = pd.to_numeric(s, errors='coerce') # convert to numeric
s_num = s_num.mask(s_num < 0) # replace values less than 0 with NaN
s_mean = s_num.fillna(0).mean() # calculate mean
return s_num.fillna(s_mean) # replace NaN with mean
df = df.apply(mean_update) # apply to each series
print(df)
A B C
0 1.0 2.25 0.75
1 1.0 4.00 3.00
2 3.0 5.00 0.75
3 1.0 2.25 0.75