我有一个如下所示的数据框:a->
id text time username
1 "hello x" 10 "me"
2 "foo and y" 5 "you"
3 "nothing" 15 "everyone"
4 "x,y,foo" 0 "know"
正确的输出应该是:
a2 - >
id text time username keywordtag
1 "hello x" 10 "me" x
2 "foo and y" 5 "you" foo,y
3 "nothing" 15 "everyone" 0
4 "x,y,foo" 0 "know" x,y,foo
任何关于如何做到的提示都将不胜感激!
答案 0 :(得分:0)
另一个想法是可怕的循环...如果你预先分配可能不是那么糟糕?
options(stringsAsFactors=F)
df1 <- data.frame(text = c("hello x", "foo and y", "nothing", "x,y,foo"))
newcol <- rep(NA, nrow(df1))
for(i in 1:nrow(df1))
newcol[i] <- paste( unlist(strsplit(df1$text[i], " "))[ grep("foo|x|y", unlist(strsplit(df1$text[i], " ")))], collapse=", ")
答案 1 :(得分:0)
这是使用apply
和sapply
的另一种方式:
df1 <- data.frame(text = c("hello x", "foo and y", "nothing", "x,y,foo"))
terms <- c('foo', 'x', 'y')
df1$keywordtag <- apply(sapply(terms, grepl, df1$text), 1, function(x) paste(terms[x], collapse=','))
df1
# text keywordtag
# 1 hello x x
# 2 foo and y foo,y
# 3 nothing
# 4 x,y,foo foo,x,y