ValueError:无法将字符串转换为浮点数:Sklearn和熊猫错误

时间:2020-09-13 22:57:36

标签: python pandas scikit-learn sklearn-pandas

我正在尝试分类模型。我正在使用SGDClassifier()

我的df有两列[全文,标签]

和 下面是我的脚本

df_scraped = pd.read_csv('data/labeled_tweets.csv') df_public = pd.read_csv('data/public_data_labeled.csv')

df_scraped.drop_duplicates(inplace = True) df_scraped.drop('id', axis
= 'columns', inplace = True) df_public.drop_duplicates(inplace = True) df = pd.concat([df_scraped, df_public])

for index, row in df.iterrows():
    text = row['full_text']
    text = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", text).split())
    df.at[index,'full_text'] = text

df['label'] = df.label.map({'Offensive': 1, 'Non-offensive': 0})

X_train, X_test, y_train, y_test = train_test_split(df['full_text'],
                                                    df['label'],
                                                    random_state=99)

print (X_train['full_text'].head(3))

print('Number of rows in the total set: {}'.format(df.shape[0])) print('Number of rows in the training set: {}'.format(X_train.shape[0])) print('Number of rows in the test set: {}'.format(X_test.shape[0]))

count_vector = CountVectorizer(stop_words = 'english', lowercase = True) training_data = count_vector.fit_transform(X_train) testing_data
= count_vector.transform(X_test)

# Dict for parameters param_grid = {
    'alpha' : [0.095, 0.0002, 0.0003],
    'max_iter' : [2500, 3000, 4000] }

print(X_train[0])

### label encode the categorical values and convert them to numbers le = LabelEncoder() le.fit(X_train[1].astype(str)) X_train[1] = le.transform(X_train[1].astype(str)) X_test[1] = le.transform(X_test[1].astype(str))

### train the model clf_sgd = SGDClassifier() clf_sgd.fit(X_train, y_train)

运行此脚本时出现错误 KeyError:“ full_text”

上述异常是以下异常的直接原因:

我不明白为什么会这样。我正在使用编码器来编码要浮动的字符串,以便可以在模型中使用它。

任何帮助将不胜感激。谢谢

0 个答案:

没有答案