我正在使用病态学习进行TF IDF,并且能够运行该功能,直到将病态学习升级到2.0.0版本为止。 现在出现以下错误:
TypeError: string indices must be integers
即使我没有更改代码中的任何内容!
from scipy.sparse import hstack, csr_matrix
print("\n[TF-IDF] Term Frequency Inverse Document Frequency Stage")
english_stop = set(stopwords.words("english"))
tfidf_para = {
"stop_words": english_stop,
"analyzer": "word",
"token_pattern": r'\w{1,}',
"sublinear_tf": True,
"dtype": np.float32,
"norm": "l2",
#"min_df":5,
#"max_df":.9,
#"use_idf ":False,
"smooth_idf":False
}
def get_col(col_name): return lambda x: x[col_name]
vectorizer = FeatureUnion([
("description",TfidfVectorizer(
ngram_range=(1, 2),
max_features=16000,
**tfidf_para,
use_idf =False,
preprocessor=get_col("description"))),
("title",TfidfVectorizer(
ngram_range=(1, 2),
**tfidf_para,
use_idf =False,
#max_features=7000,
preprocessor=get_col("title")))
])
start_vect=time.time()
vectorizer.fit(df.loc[df.index,:].to_dict("records"))
ready_df = vectorizer.transform(df.to_dict("records"))
tfvocab = vectorizer.get_feature_names()
print("Vectorization Runtime: %0.2f Minutes"%((time.time() - start_vect)/60))
这是我的字典格式的一个示例:
[{'title': 'title1',
'description': 'description1'},
{'title': 'title2 ',
'description': 'description2'}]
你们对我在这里缺少什么有任何见识吗? 谢谢 ! :)