使用sklearn获取每个ngram项的频率

时间:2016-06-20 05:17:59

标签: python scikit-learn

我使用以下方法从pandas数据帧中提取ngrams:

@Service
public class SomeFactory {
    @Autowired
    private List<Foo> foos;

    @PostConstruct
    public void init() {
        for(Foo foo: foos) { 
            //do something
        }
    }
}

我想了解获取每个ngram项的频率的方法吗?

1 个答案:

答案 0 :(得分:1)

发布用于获取计数的代码

train_data_features = X_train_counts.toarray()
vocab = vect.get_feature_names()
dist = np.sum(train_data_features, axis=0)
ngram_freq = {}

# For each, print the vocabulary word and the frequency
for tag, count in zip(vocab, dist):
    #print(tag, count)
    ngram_freq[tag]=count