Question

R的新手并且一直在努力解决这个问题。我想创建一个新列，检查列中是否存在一组任何单词（＆＃34; foo＆＃34;，＆＃34; x＆＃34;，＆＃34; y＆＃34;） 39; text＆＃39;，然后在新列中写入该值。

我有一个如下所示的数据框：a-＆gt;

 id     text        time   username
 1     "hello x"     10     "me"
 2     "foo and y"   5      "you"
 3     "nothing"     15     "everyone"
 4     "x,y,foo"     0      "know"

正确的输出应该是：

a2 - ＆gt;

id     text        time   username        keywordtag  
 1     "hello x"     10     "me"          x
 2     "foo and y"   5      "you"         foo,y
 3     "nothing"     15     "everyone"    0 
 4     "x,y,foo"     0      "know"        x,y,foo

任何关于如何做到的提示都将不胜感激！

Answer 1

另一个想法是可怕的循环...如果你预先分配可能不是那么糟糕？

options(stringsAsFactors=F)
df1 <- data.frame(text = c("hello x", "foo and y", "nothing", "x,y,foo"))
newcol <- rep(NA, nrow(df1))
for(i in 1:nrow(df1))
  newcol[i] <- paste( unlist(strsplit(df1$text[i], " "))[ grep("foo|x|y", unlist(strsplit(df1$text[i], " ")))], collapse=", ")

Answer 2

这是使用apply和sapply的另一种方式：

df1 <- data.frame(text = c("hello x", "foo and y", "nothing", "x,y,foo"))
terms <- c('foo', 'x', 'y')
df1$keywordtag <- apply(sapply(terms, grepl, df1$text), 1, function(x) paste(terms[x], collapse=','))
df1
#        text keywordtag
# 1   hello x          x
# 2 foo and y      foo,y
# 3   nothing           
# 4   x,y,foo    foo,x,y

R根据数据框中的存在创建变量

2 个答案: