如果满足条件,如何从另一个数据框列中减去数据框列?

时间:2018-03-20 16:47:17

标签: r dataframe conditional-statements multiple-columns subtraction

我有两个简单的数据框,包含“word”和“n”列,表示某个单词出现的频率。这是一个例子:

df1 <- data.frame(word=c("beautiful","nice","like","good"),n=c(400,378,29,10))
df2 <- data.frame(word=c("beautiful","nice","like","good","wonderful","awesome","sad","happy"),n=c(6000,20,5,150,300,26,17,195))

df1的字词外,df2包含更多字词,因此df1只是df2的一小部分。

我找到了df1df2中包含的字词。现在,如果特定单词包含在df1中,我想从df2中减去df2的单词计数,这意味着我想执行以下操作:

  • 减去字数:df2$n - df1$n
  • df1$word
  • 中包含IF df2$word

我希望我的问题很明确。

我已经找到了df1中包含在df2

中的所有单词
df1 %>% filter(df1$word %in% df2$word)

但是,基于df1中的单词必须也在df2中然后只减去df2$n - df1$n

的条件,我正在努力减去命令

感谢您的帮助!

4 个答案:

答案 0 :(得分:3)

使用merge

> df.tmp <- merge(df1, df2, by="word", all=TRUE)
> df.tmp$result <- df.tmp$n.y - df.tmp$n.x
> df.tmp
       word n.x  n.y result
1 beautiful 400 6000   5600
2      good  10  150    140
3      like  29    5    -24
4      nice 378   20   -358
5   awesome  NA   26     NA
6     happy  NA  195     NA
7       sad  NA   17     NA
8 wonderful  NA  300     NA

如果您只想要匹配的单词

> df.tmp <- merge(df1, df2, by="word")
> df.tmp$result <- df.tmp$n.y - df.tmp$n.x
> df.tmp
       word n.x  n.y result
1 beautiful 400 6000   5600
2      good  10  150    140
3      like  29    5    -24
4      nice 378   20   -358

答案 1 :(得分:2)

require(dplyr)
 df1 %>% 
  inner_join(df2, by = 'word') %>% 
  mutate(diff = n.y - n.x) %>% 
  select(word, diff)

给出

       word diff
1 beautiful 5600
2      nice -358
3      like  -24
4      good  140

答案 2 :(得分:2)

以下是使用for循环和%in%运算符的快速解决方案。

df2$diff <- NA
for (i in 1:nrow(df2)) {
  if (df2$word[i] %in%  df1$word[i]) {
    df2$diff[i] <- df2$n[i] - df1$n[i]
  }
}
df2

输出:

> df2
       word    n diff
1 beautiful 6000 5600
2      nice   20 -358
3      like    5  -24
4      good  150  140
5 wonderful  300   NA
6   awesome   26   NA
7       sad   17   NA
8     happy  195   NA

答案 3 :(得分:2)

这是一个矢量化基本解决方案,其中布尔乘法用于替换@Rob中for-lop中使用的if-then结构:

 df2$n.adjusted <- df2$n - (df2$word %in% df1$word)* # zero if no match
                                 df1$n[ match(df1$word, df2$word) ] # gets order correct
> df2
       word    n n.adjusted
1 beautiful 6000       5600
2      nice   20       -358
3      like    5        -24
4      good  150        140
5 wonderful  300        300
6   awesome   26         26
7       sad   17         17
8     happy  195        195

以下是我用来测试df1字的顺序与df2中的顺序不同且长度不是偶数倍的示例:

> df1 <-data.frame(word=c("nice","beautiful","like","good"),n=c(378,400,29,10))
> df2 <- data.frame(word=c("beautiful","nice","like","good","wonderful","awesome","sad"),n=c(6000,20,5,150,300,26,17))
> 
>  df1
       word   n
1      nice 378
2 beautiful 400
3      like  29
4      good  10
>  df2
       word    n
1 beautiful 6000
2      nice   20
3      like    5
4      good  150
5 wonderful  300
6   awesome   26
7       sad   17
> df2$n.adjusted <- df2$n - (df2$word %in% df1$word)*df1$n[match(df1$word, df2$word)]
Warning message:
In (df2$word %in% df1$word) * df1$n[match(df1$word, df2$word)] :
  longer object length is not a multiple of shorter object length
> df2
       word    n n.adjusted
1 beautiful 6000       5600
2      nice   20       -358
3      like    5        -24
4      good  150        140
5 wonderful  300        300
6   awesome   26         26
7       sad   17         17