用R表示单词表

时间:2016-03-07 22:51:01

标签: r data-mining text-mining word sentencecase

我有一些句子,从我想要分隔单词的句子中获取行向量。但是这些单词正在重复以匹配我不想要的最大句子的行向量。我想无论句子有多大,每个句子的行向量只会是一次单词。

>>> d = {tuple(RND.sample(range(100), 2)) for c in range(5)}
>>> d
{(17, 53), (74, 5), (88, 11), (21, 56), (15, 78)}
>>> type(d)
<class 'set'>

>>> a = (15, 78)
>>> a in d
True
>>> b = (32, 6)
>>> b in d
False
>>> d.add((4, 1))
>>> d
{(74, 5), (15, 78), (17, 53), (88, 11), (4, 1), (21, 56)}

这就是我现在所能得到的,enter image description here

这就是我想要的, enter image description here

我的意思是不重复

1 个答案:

答案 0 :(得分:2)

来自 rawr 的解决方案,

/Applications/Firefox.app/Contents/MacOS/firefox-bin

或者,

sentence <- c("case sweden", "meeting minutes ht board meeting st march now also attachment added agenda today s board meeting", "draft meeting minutes board meeting final meeting minutes ht board meeting rd april")
dd <- read.table(text = paste(sentence, collapse = '\n'), fill = TRUE)
test <- cbind(sentence, dd)

感谢。