对于当前的项目,我计划在包含数字数据的CSV集合上运行scikit-learn随机梯度助推器算法。
当调用脚本的line sgbr.fit(X_train, y_train)
时,我收到的ValueError: could not convert string to float:
上没有给出无法格式化的各个区域的详细信息。
到目前为止,我尝试过通过pd.to_numeric(df.column.str, errors='coerce')
的转换来解决问题,但是由于从熊猫DataFrame过渡到了numpy,导致了后续错误AttributeError: 'numpy.ndarray' object has no attribute 'drop'
数组。
有人知道为什么在没有进一步的位置指示的情况下出现ValueError吗?
CSV文件如下所示:
相关代码部分如下所示:
# Load CSV and fill empty cells
Germany = pd.read_csv('./Germany_filtered.csv', index_col=0)
Germany = Germany.fillna("")
# Select relevant dependent variable
X = Germany.drop('Status', axis='columns')
y = Germany['Status']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=2)
from sklearn.ensemble import GradientBoostingRegressor
# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4, n_estimators=200, subsample=0.9,
max_features=0.75, random_state=2)
# Fit sgbr to the training set
sgbr.fit(X_train, y_train)