我正在使用文本分类来识别方言。我正在使用sklearn和countVectorizer,我想在基于字符的ngram和词汇上训练朴素的贝叶斯分类器。因此,我在两个countVectorizers上进行了以下设置:
@JsonConverter
export class MyEnumConverter implements JsonCustomConvert<MyEnumConverter> {
serialize(val: MyEnumConverter): string {
return MyEnumConverter[val];
}
deserialize(val: any): MyEnumConverter {
const possibleValidEnum = (<any>MyEnumConverter)[val];
if (possibleValidEnum === undefined) {
throw Error();
}
return <MyEnumConverter>possibleValidEnum;
}
}
我从那里去哪里?我尝试过:
c=CountVectorizer(analyzer='char', ngram_range=(2,3))
c.fit_transform(X_train)
v=CountVectorizer(vocabulary=vocabs)
v.fit_transform(X_train)
如本文所建议: How to use bigrams + trigrams + word-marks vocabulary in countVectorizer?
但是没用