当我尝试运行我的代码时出现此ValueError

时间:2019-07-21 14:42:01

标签: python scikit-learn nlp

我正在尝试运行以下代码,但出现此错误不知道为什么。 ValueError:X每个样本具有6个功能;预计2613

df1 = pd.read_csv('train_set.csv',encoding='latin-1')
df1.columns = df1.columns.str.strip()
con = sqlite3.connect("TrainSet.db")
df1.to_sql("Table1",con)
con.close()

text = []
for i in df1['SentimentText']:
    text.append(i)
for i in range(len(text)):
    text[i] = word_tokenize(text[i].lower())

stpwrds = stopwords.words('english')
stpwrds.extend(['.',',','-','_','&','!','@','*',')','(',':','/',';'])
stpwrds = set(stpwrds)

for i in range(len(text)):
    text[i] = list(set(text[i]) - stpwrds)

lemmatizer = WordNetLemmatizer()
for i in range(len(text)):
    for j in range(len(text[i])):
        text[i][j] = lemmatizer.lemmatize(text[i][j], pos='v')

for i in range(len(text)):
    text[i] = ' '.join(text[i])

vectorizer = TfidfVectorizer()
matrix = vectorizer.fit_transform(text)
X = matrix[:7000]
Y = np.array(df1['Sentiment'][:7000])

f='such horrible movie, never gonna watching it again!!!'
f=word_tokenize(f.lower())
for k in range(len(stw)):
    while stw[k] in f:
        f.remove(stw[k])

lemmatizer = WordNetLemmatizer()
for i in f:
    i = lemmatizer.lemmatize(i,pos='v')

f= ' '.join(f)

g=vect.fit_transform([f])
g=g.toarray()

X_train,X_test,Y_train,Y_test = train_test_split(X.toarray(),Y)  <--- 
                                                                 MemoryError 
lr = LogisticRegression()
lr.fit(X_train,Y_train)
Y_pred = lr.predict(g)   <---ValueError: X has 6 features per sample; 
                             expecting 2613

有人可以帮助我解决此错误吗?另外,我在执行train_test_split时遇到内存错误。

0 个答案:

没有答案