我有一个列形式的数据框 - 输入
Id Comment
xc545 Ronald is a great person
g6548 Hero worship is bad
我需要输出形式 结果
Id Words
xc545 Ronald
xc545 is
xc545 a
xc545 great
xc545 person
g6548 Hero
g6548 worship
g6548 is
g6548 bad
需要一个R语句来执行此操作。
以下是我的尝试 -
result<-lapply(input,function(x)strsplit(x[2]," "))
然而,这只返回一条记录。
答案 0 :(得分:9)
假设DF
是您的data.frame,可能是:
> List <- strsplit(DF$Comment, " ")
> data.frame(Id=rep(DF$Id, sapply(List, length)), Words=unlist(List))
Id Words
1 xc545 Ronald
2 xc545 is
3 xc545 a
4 xc545 great
5 xc545 person
6 g6548 Hero
7 g6548 worship
8 g6548 is
9 g6548 bad
请注意,我的答案仅在每对单词之间有一个简单的空格时才有效。
答案 1 :(得分:3)
data.table
解决方案的灵感源自this一个:
library(data.table)
dt = data.table(df)
dt[,c(Words=strsplit(Comment, " ", fixed = TRUE)), by = Id]
Id V1
1: xc545 Ronald
2: xc545 is
3: xc545 a
4: xc545 great
5: xc545 person
6: g6548 Hero
7: g6548 worship
8: g6548 is
9: g6548 bad
答案 2 :(得分:3)
使用scan
,tapply
和stack
:
d <- read.table(text='Id Comment
xc545 "Ronald is a great person"
g6548 "Hero worship is bad"', header=TRUE, as.is=TRUE)
stack(tapply(d$Comment, d$Id, function(x) scan(text=x, what='')))
# values ind
# 1 Hero g6548
# 2 worship g6548
# 3 is g6548
# 4 bad g6548
# 5 Ronald xc545
# 6 is xc545
# 7 a xc545
# 8 great xc545
# 9 person xc545