SMOTENC:无法将字符串转换为浮点型

时间:2019-10-03 07:47:04

标签: python oversampling smote

我的数据不平衡,M的百分比为80%,F的百分比为20%。下面是数据示例:

Loop over the line one char at a time
  if in a word
    if a word character
      add the character to current word
    if a separator
      end the current word with a null
  if in separators
    if a word character
      add a new word and make it the current word
      add the character to that word
if in a word
  end the word with a null character

所以我想使用NAME COUNTRY HEIGHT HANDPHONE TYPE GENDER NOVI USA 160 samsung SM-G610F F JOHN JAPAN 181 vivo 1718 M RICHARD UK 175 samsung SM-G532G M ANTHONY UK 179 OPPO F1fw M SAMUEL UK 185 Iphone 8 plus M BUNGA KOREA 170 Iphone 6s F 来平衡M:F的百分比为50%:50%的数据。我已经尝试过以下脚本:

SMOTENC

但我越来越出错了:

import numpy as np
import pandas as pd
import scipy.stats as stats
import sklearn
import keras
import imblearn
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')

df=pd.read_excel('Data for oversampling.xlsx')
Data = df
Data.GENDER.replace({'M':0,'F':1},inplace=True)
sns.countplot('GENDER', data = Data)
y = Data.GENDER
x = Data.drop('GENDER', axis=1)

from imblearn.over_sampling import SMOTENC
smote_nc = SMOTENC(categorical_features=[0,3], random_state=0)
x_resampled, y_resampled = smote_nc.fit_resample(x, y)

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

在数据集中,除要素2(唯一的非分类)外,所有要素都是分类的。您需要更新categorical_features列表。