AttributeError:'list'对象没有来自Tfidf_vect.fit的'lower'属性

时间:2019-06-23 07:03:24

标签: python vector nlp svm tf-idf

我正在尝试使用tf-idf功能应用SVM。 但是我得到了这个错误:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Program Files\JetBrains\PyCharm 2019.1.3\helpers\pydev\_pydev_bundle\pydev_umd.py", line 197, in runfile
    pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
  File "C:\Program Files\JetBrains\PyCharm 2019.1.3\helpers\pydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "C:/Users/lam/.PyCharm2019.1/config/scratches/scratch_1.py", line 35, in <module>
    Tfidf_vect.fit(data['input'])
  File "C:\Users\lam\PycharmProjects\untitled\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 1631, in fit
    X = super().fit_transform(raw_documents)
  File "C:\Users\lam\PycharmProjects\untitled\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 1058, in fit_transform
    self.fixed_vocabulary_)
  File "C:\Users\lam\PycharmProjects\untitled\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 970, in _count_vocab
    for feature in analyze(doc):
  File "C:\Users\lam\PycharmProjects\untitled\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 352, in <lambda>
    tokenize(preprocess(self.decode(doc))), stop_words)
  File "C:\Users\lam\PycharmProjects\untitled\venv\lib\site-packages\sklearn\feature_extraction\text.py", line 256, in <lambda>
    return lambda x: strip_accents(x.lower())
AttributeError: 'list' object has no attribute 'lower'

这是我的代码:

data['input']= [nltk.word_tokenize(entry) for entry in data['input']]

Train_X, Test_X, Train_Y, Test_Y = sklearn.model_selection.train_test_split(data['input'],data['Class'],test_size=0.2)

Encoder = LabelEncoder()
Train_Y = Encoder.fit_transform(Train_Y)
Test_Y = Encoder.fit_transform(Test_Y)

Tfidf_vect = TfidfVectorizer()
Tfidf_vect.fit(data['input'])


Train_X_Tfidf = Tfidf_vect.transform(Train_X)
Test_X_Tfidf = Tfidf_vect.transform(Test_X)

print(Tfidf_vect.vocabulary_)

我正在使用python 3.6.0,我的数据集是阿拉伯语的。

谢谢

1 个答案:

答案 0 :(得分:0)

该错误表明import React from "react"; import styled from "@emotion/styled"; import { Formik, Form, Field, ErrorMessage } from "formik"; import { MyContextConsumer } from "../../context/UserStateContext"; const StyledSignUp = styled.div` width: 50%; margin: 20px auto; `; const SignUpForm = ({ props, ...remainProps }) => { return ( <StyledSignUp {...remainProps}> <MyContextConsumer> {context => { console.log(context, "CONTEXT API"); return <div className='content'>content here</div>; }} </MyContextConsumer> </StyledSignUp> ); }; export default SignUpForm; 需要一个字符串作为其输入-而不是字符串列表。它自己完成所有标记处理(但是,如果需要,您可以在TfidfVectorizer内插入自定义标记处理程序)。

因此,我将尝试一个更简单的管道,而不使用第一行(使用TfidfVectorizer)。但是我不能百分百确定,因为您没有提供任何导致错误的实际输入数据示例。