在R中基于层次结构对单词进行分组

时间:2013-09-09 20:20:04

标签: string r hierarchy

我想在我的矢量中得到一个层次结构,例如:

# Start (in reality these will not be right next to each other)

words <- c("hello-world", "hello", "string", "sub-string", "custom-fields", 
           "custom", "hi-hat", "hat") 

# Result

highlevel <- c("hello-world", "sub-string", "custom-fields", "hi-hat")
lowerlevel <- c("hello", "string", "custom", "hat") 

实际上,我将面对大数据,并且正在寻找一种有效的方法来对这些进行分组。如果可能的话,我也希望他们以某种方式联系起来。目标是首先搜索更高级别的单词,当找不到它们时,查找较低级别的单词。

想法?

1 个答案:

答案 0 :(得分:2)

g <- grep('[-.[:digit:]]', words) # give indices of matches.

highlevel <- words[g]
lowlevel <- words[-g]