我有四类与旅行领域相关的词汇。 例如: -
ACCOMMODATION = {"hotel","restaurant","cafe","tea shop","lodging","coffee"}
COST = {"costly","expensive","price","inexpensive","fee","ticket"}
AMBIANCE = {"ambiance","ambience","cool","warm","hot"}
TRANSPORT = {"car","van","ride","walk","traffic","travel","road"}
我还有一个句子列表,我想要的是搜索每个句子并检查句子中出现的上述任何关键词。如果是这样,用句子标记相关的类别/类别,以便最终输出类似于;
sentence1 [tab] ACCOMMODATION,COST
sentence2 [tab] ACCOMMODATION
sentence3 [tab] TRANSPORT
如何才能最有效地实现这一目标?
提前致谢。
答案 0 :(得分:1)
首先,你应该将一个句子分成单词
Stream<String> words = Arrays.stream(sentence.split("\\s"));
为每个类别创建HashSet:
Set<String> transportWords = new HashSet<>(Arrays.asList("car","van","ride","walk","traffic","travel","road"));
Set<String> costWords = new HashSet<>(Arrays.asList("costly","expensive","price","inexpensive","fee","ticket"));
并将它们映射到类别:
Map<Set<String>, Category>> map = new HashMap<>();
map.put(transportWords, Category.TRANSPORT);
map.put(costWords, Category.TRANSPORT);
然后迭代句子单词并检查它们是否属于某个类别
Set<Category> categories = Arrays.stream(sentence.split("\\s"))
.map(s -> {
for (Set<String> keywords : map.keySet()) {
if (keywords.contains(s)) {
return Optional.of(map.get(keywords));
}
}
return Optional.<Category>empty();
})
.filter(Optional::isPresent)
.map(Optional::get)
.collect(Collectors.toSet());