将句子转换为R中的单词

时间:2013-06-04 15:07:26

标签: r

我有一个列形式的数据框 - 输入

Id  Comment
xc545   Ronald is a great person 
g6548   Hero worship is bad

我需要输出形式 结果

Id  Words 
xc545   Ronald
xc545   is
xc545   a
xc545   great
xc545   person
g6548   Hero
g6548   worship
g6548   is
g6548   bad

需要一个R语句来执行此操作。

以下是我的尝试 -

result<-lapply(input,function(x)strsplit(x[2]," "))

然而,这只返回一条记录。

3 个答案:

答案 0 :(得分:9)

假设DF是您的data.frame,可能是:

> List <- strsplit(DF$Comment, " ")
> data.frame(Id=rep(DF$Id, sapply(List, length)), Words=unlist(List))
     Id   Words
1 xc545  Ronald
2 xc545      is
3 xc545       a
4 xc545   great
5 xc545  person
6 g6548    Hero
7 g6548 worship
8 g6548      is
9 g6548     bad

请注意,我的答案仅在每对单词之间有一个简单的空格时才有效。

答案 1 :(得分:3)

data.table解决方案的灵感源自this一个:

library(data.table)
dt = data.table(df)
dt[,c(Words=strsplit(Comment, " ", fixed = TRUE)), by = Id]
Id      V1
1: xc545  Ronald
2: xc545      is
3: xc545       a
4: xc545   great
5: xc545  person
6: g6548    Hero
7: g6548 worship
8: g6548      is
9: g6548     bad

答案 2 :(得分:3)

使用scantapplystack

d <- read.table(text='Id  Comment
xc545   "Ronald is a great person"
g6548   "Hero worship is bad"', header=TRUE, as.is=TRUE)

stack(tapply(d$Comment, d$Id, function(x) scan(text=x, what='')))
#    values   ind
# 1    Hero g6548
# 2 worship g6548
# 3      is g6548
# 4     bad g6548
# 5  Ronald xc545
# 6      is xc545
# 7       a xc545
# 8   great xc545
# 9  person xc545