Question

这是我的代码。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import GradientBoostingClassifier
from yellowbrick.features.importances import FeatureImportances


# First, let’s load the data:
# read the data
df = pd.read_csv('C:\\path_here\\test.csv')

# handle zip codes in a special way
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
df.dtypes


# workign with ONLY records that have SOME office area'; don't want to conduct training on non-officearea records
df1 = df[df['officearea']!=0]
df1.shape
list(df1)


df1 = df1.fillna(0)
df1.shape               



# Specify the features of interest
features = ['block','zipcode','policeprct','healthcenterdistrict','healtharea','sanitboro','sanitdistrict']

# Extract the instances and target
X = df1[features]
y = df1.officearea


# Create a new matplotlib figure
fig = plt.figure()
ax = fig.add_subplot()

viz = FeatureImportances(GradientBoostingClassifier(), ax=ax)
viz.fit(X, y)
viz.poof()

我从数据框中的某些内容获取此错误。

Traceback (most recent call last):

  File "<ipython-input-402-0e8d46c0d89f>", line 5, in <module>
    viz.fit(X, y)

  File "C:\Users\Excel\Anaconda3\lib\site-packages\yellowbrick\features\importances.py", line 136, in fit
    super(FeatureImportances, self).fit(X, y, **kwargs)

  File "C:\Users\Excel\Anaconda3\lib\site-packages\yellowbrick\base.py", line 311, in fit
    self.estimator.fit(X, y)

  File "C:\Users\Excel\Anaconda3\lib\site-packages\sklearn\ensemble\gradient_boosting.py", line 1395, in fit
    X, y = check_X_y(X, y, accept_sparse=['csr', 'csc', 'coo'], dtype=DTYPE)

  File "C:\Users\Excel\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 756, in check_X_y
    estimator=estimator)

  File "C:\Users\Excel\Anaconda3\lib\site-packages\sklearn\utils\validation.py", line 527, in check_array
    array = np.asarray(array, dtype=dtype, order=order)

  File "C:\Users\Excel\Anaconda3\lib\site-packages\numpy\core\numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)

ValueError: could not convert string to float: '00nan'

我似乎无法弄清楚是什么原因导致了该问题，并且由于该问题，下游事务也失败了。奇怪的是，当我查看数据集时，“ 00nan”一无所有。这里最简单的道路是什么？谢谢你的表情。

Answer 1

两件事的结合对我有用。

df = pd.read_csv('C:\\path_here\\test.csv', na_values='00nan')

df1 = df1.dropna(axis='columns')

因此，代码的最后一部分可以正常工作。

# import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import GradientBoostingClassifier
from yellowbrick.features.importances import FeatureImportances


# First, let’s load the data:
# read the data
df = pd.read_csv('C:\\path_here\\test.csv', na_values='00nan')

# handle zip codes in a special way
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
df.dtypes


# workign with ONLY records that have SOME office area'; don't want to conduct training on non-officearea records
df1 = df[df['officearea']!=0]
df1.shape
list(df1)


# handle NANs
df1 = df1.dropna(axis='columns')
df1 = df1.fillna(0)
df1.shape               

# Specify the features of interest
features = ['block','zipcode','policeprct','healthcenterdistrict','healtharea','sanitboro','sanitdistrict']

# Extract the instances and target
X = df1[features]
y = df1.officearea


# Create a new matplotlib figure
fig = plt.figure()
ax = fig.add_subplot()

viz = FeatureImportances(GradientBoostingClassifier(), ax=ax)
viz.fit(X, y)
viz.poof()

谢谢你把我推向正确的方向，格雷格莱特！

Python错误：ValueError：无法将字符串转换为浮点数：'00nan'

1 个答案: