根据两个R中的匹配列从column2(dataframe2)中减去column1(dataframe1)

时间:2015-06-08 17:54:32

标签: r dataframe matching

Dataframe1有两列:num_movies和userId。 Dataframe2有两列:No_movies和userId。但Dataframe2有2106行,Dataframe1有1679行。我想根据匹配的userId值从Dataframe2中减去Dataframe1中的电影数量。我写了以下这一行:

df1$num_movies = df1$num_movies - df2$No_movies[df1$userId %in% df2$userId]

我收到以下错误:

Error in `$<-.data.frame`(`*tmp*`, "num_movies", value = c(2, 9, 743,  : 
  replacement has 2106 rows, data has 1679
In addition: Warning message:
In df1$num_movies - df2$No_movies[df1$userId %in%  :
  longer object length is not a multiple of shorter object length

Elsewhere有人建议我从3.0.2升级到3.1.2来解决这个问题。但升级后我仍然有同样的错误。我所写的内容对我来说似乎合乎逻辑。我打算在2106中只挑选1679个userIds。为什么选择所有这些?我该如何规避这个错误?

1 个答案:

答案 0 :(得分:0)

您可以使用match功能为Dataframe2中的每一行Dataframe1找到相应的行。

matched.movies <- Dataframe2$No_movies[match(Dataframe1$userId, Dataframe2$userId)]
matched.movies[is.na(matched.movies)] <- 0
Dataframe1$num_movies <- Dataframe1$num_movies - matched.movies
Dataframe1
#   num_movies userId
# 1         10      1
# 2          7      2
# 3          6      3

数据:

(Dataframe1 <- data.frame(num_movies=rep(10, 3), userId=1:3))
#   num_movies userId
# 1         10      1
# 2         10      2
# 3         10      3
(Dataframe2 <- data.frame(No_movies=2:6, userId=c(0, 2, 3, 9, 10)))
#   No_movies userId
# 1         2      0
# 2         3      2
# 3         4      3
# 4         5      9
# 5         6     10