我使用以下方法从pandas数据帧中提取ngrams:
@Service
public class SomeFactory {
@Autowired
private List<Foo> foos;
@PostConstruct
public void init() {
for(Foo foo: foos) {
//do something
}
}
}
我想了解获取每个ngram项的频率的方法吗?
答案 0 :(得分:1)
发布用于获取计数的代码
train_data_features = X_train_counts.toarray()
vocab = vect.get_feature_names()
dist = np.sum(train_data_features, axis=0)
ngram_freq = {}
# For each, print the vocabulary word and the frequency
for tag, count in zip(vocab, dist):
#print(tag, count)
ngram_freq[tag]=count