如何串联两个countVectorizers?

时间:2019-05-16 10:30:12

标签: python concatenation text-classification n-gram countvectorizer

我正在使用文本分类来识别方言。我正在使用sklearn和countVectorizer,我想在基于字符的ngram和词汇上训练朴素的贝叶斯分类器。因此,我在两个countVectorizers上进行了以下设置:

@JsonConverter
export class MyEnumConverter implements JsonCustomConvert<MyEnumConverter> {
  serialize(val: MyEnumConverter): string {
    return MyEnumConverter[val];
  }
  deserialize(val: any): MyEnumConverter {
    const possibleValidEnum = (<any>MyEnumConverter)[val];
    if (possibleValidEnum === undefined) {
      throw Error();
    }
    return <MyEnumConverter>possibleValidEnum;
  }
}

我从那里去哪里?我尝试过:

c=CountVectorizer(analyzer='char', ngram_range=(2,3))
c.fit_transform(X_train)

v=CountVectorizer(vocabulary=vocabs)
v.fit_transform(X_train)

如本文所建议: How to use bigrams + trigrams + word-marks vocabulary in countVectorizer?

但是没用

0 个答案:

没有答案