数据

Question

我有一个这样的表：

ID  |   Word1   | Word2     | Word3     | Word4 | Word5 | Word6 | Word7
1   |   like    | grilled   | cheese    | except| omelet| and   | cheese
1   |   like    | grilled   | cheese    | except| omelet| and   | cheese
1   |   like    | grilled   | cheese    | except| omelet| and   | cheese
1   |   like    | grilled   | cheese    | except| omelet| and   | cheese
2   |   i       | have      | to        | write | it    | six   | times
2   |   i       | have      | to        | write | it    | six   | times

我想添加一个新列，该列计算Word7列中每个单词出现在所有其他WordX列中的次数。因此，对于ID = 1的行，此新列的值为1（因为奶酪出现在列Word3中）。对于ID = 2的行，其值将为0。但是，如果`Word7中的单词在1-6列中多次出现，则可能还会存在值大于1的行。

我已经尝试过使用dplyr intersect()和select()的一些方法，但是我什至难以概念化这种方法（我有点菜鸟）。

这些列中内容完全相同的FYI行可以出现多次，但是还有其他列具有唯一值（但是与该问题无关，这就是为什么我忽略了它们）。

Answer 1

这是处理mapply

的一种方法

rowSums(mapply(function(x, y) grepl(y, x), df[,-c(1, 8)], df[[8]]))
#[1] 1 1 1 1 0 0

您需要使用mapply，该函数将x和y参数一个接一个地应用（对于每一行）。我们在此应用的功能是检测所有其他列中的word7的单词（排除ID col除外）。完成后，我们将得到一个带有逻辑语句的数据帧，其中rowSums计算TRUE总数

Answer 2

library(dplyr)
df %>% mutate(A=rowSums(.[2:7]==Word7))

使用BaseR

rowSums(df[,-c(1,8)]==df$Word7)
[1] 1 1 1 1 0 0

df[,-c(1,8)]==df$Word7将返回TRUE和FALSE数据帧，然后我们可以使用RowSums

对TRUE的“ in”行求和。

数据

 df <- read.table(text="
  ID      Word1     Word2       Word3       Word4   Word5   Word6   Word7
                   1       like      grilled     cheese      except  omelet  and     cheese
                   1       like      grilled     cheese      except  omelet  and     cheese
                   1       like      grilled     cheese      except  omelet  and     cheese
                   1       like      grilled     cheese      except  omelet  and     cheese
                   2       i         have        to          write   it      six     times
                   2       i         have        to          write   it      six     times",
       header=T,stringsAsFactor=F)

dplyr：计算A列与其他几列的匹配数并写入新列

2 个答案:

数据