Question

我有一个数据框，有两列

'V1'     'V2'
joe      hi, my names is *joe*
anne     i was talking to *jake* the other day...
steve    *anne* should have the answer
steve    *joe* and I will talk later

我想获取col 1中的名称列表，并使用它在col 2中搜索它们。

（星号表示名称在长字符串中。）

我真正想要的是，对于第一列中的每个条目，如果您也可以在第二列中找到它，则打印该行。

我试过这个

for (i in dft[1]) if (i == dft[2]) print(i)

这个想法是计算它们在每一列中出现的次数，最后得到类似

的内容

V1    V2    V3
joe   1     2
anne  1     1
jake  0     1
steve 2     0

有什么想法吗？

Answer 1

假设您想要计算每列中第一列的每个元素出现的次数，您可以执行以下操作

dat <- data.frame(V1=c("joe", "ann", "steve", "steve"),
                  V2=c("hi, my name is *joe*", 
                       "i was talking to *jake* the other day...", 
                       "*anne* should have the answer",
                       "*joe* and I will talk later"), 
                  stringsAsFactors=FALSE)

t(sapply(dat$V1, function(x) cbind(length(grep(x, dat$V1)), length(grep(x, dat$V2)))))

#      [,1] [,2]
#joe      1    2
#ann      1    1
#steve    2    0
#steve    2    0

sapply会将一个函数应用于列V1的每个元素。在这种情况下，该函数将计算元素在列V1和列V2和cbind中出现的次数。 sapply会将结果简化为矩阵。最后，t会将矩阵转置为您请求的格式。

Answer 2

不幸的是grep没有对其第一个参数进行矢量化，因此您必须mapply它。

dat <- data.frame(V1=c("joe","anny"),V2=c("hi, my name is joe","blah anne"))
mapply( FUN=function(x,y) grepl(x,y), x=dat$V1, y=dat$V2 )

它为您提供了一个逻辑向量，您可以将其用于子集或求和以用于显示目的。

比较字符串和打印行R.

2 个答案: