如何使用Python修复向量错误?

时间:2019-06-05 20:04:45

标签: python-3.x tfidfvectorizer

人!我对Python有问题。有谁能够帮助我?我是python的初学者

我有一个带有信息的数据框,并且使用字符串字段。

该列的示例: Dataframe Column

代码是:

数据集

data = pd.read_csv("dataset.csv",sep=';',encoding='latin-1',error_bad_lines=False)

删除空值

data['campo'].dropna(inplace=True)

删除空格

data['campo'] = data['campo'].str.lstrip()
data['campo'] = data['campo'].str.rstrip()

删除引号

data['campo'] = data['campo'].str.replace('ú','u')
data['campo'] = data['campo'].str.replace('ó','o')
data['campo'] = data['campo'].str.replace('í','i')
data['campo'] = data['campo'].str.replace('é','e')
data['campo'] = data['campo'].str.replace('á','a')

降低

data['campo'] = data['campo'].str.lower()

删除标点符号

data['campo'] = data['campo'].str.replace(r'[^\w\s]','')

令牌化

data['campo']= data['campo'].str.split()

直到结果为: Preview

删除停用词

import nltk
nltk.download('stopwords')

stop_words = set(stopwords.words("spanish"))     

#funcion
def remove_stops(row):
    my_list = row['campo']
    meaningful_words = [w for w in my_list if not w in stop_words]
    return (meaningful_words)


data['campo'] = data.apply(remove_stops, axis=1)

培训和测试

Train_X, Test_X, Train_Y, Test_Y = model_selection.train_test_split(data['campo'],data['Target'],test_size=0.3)

Vectorizer

Tfidf_vect = TfidfVectorizer(max_features=5000)
Tfidf_vect.fit(data['campo'])

然后,给我一个错误:

错误

AttributeError:“ list”对象没有属性“ lower”

我不知道为什么。我是python的初学者,购买我不知道如何解决它。

如何解决?谢谢 。对不起我的英语!

0 个答案:

没有答案