根据其他数据框更新数据框中的观察结果?

时间:2017-11-13 15:10:47

标签: r join dataframe match

我有一个数据框说prod_score:

product score
a       1
d       2
ff      2
e       3
fvf     1

我有另一个数据帧prod_rank与相同的产品+他们的等级prod_rank:

product   rank
a         11
d         4
ff        1
e         5
fvf       9

只是为了澄清我有很多观察,这就是我展示样本数据的原因。

使用得分2过滤所有产品:

library(dplyr)
prod_scr_2 <- prod_score %>% filter(score == 2)

现在我想根据prod_rank df获取prod_scr_2产品并更新分数:

我使用过join:

decision_tbl <- inner_join(prod_scr_2, prod_rank, by = "product") %>%
                                                top_n(2,desc(rank))

现在我正在考虑decision_tbl$product,并希望仅更新获得最高排名的产品的分数。

我用匹配来做到这一点:

prods2update_idx <- match(decision_tbl$product, prod_score$product)

鉴于匹配索引我正在尝试更新prod_score数据帧,请告知我该怎么做?

1 个答案:

答案 0 :(得分:1)

假设感兴趣的分数为2(如您在示例中所述),并且最高等级的产品的更新分数为100.可以更改。

这是一个dplyr解决方案,因为我看到你开始使用这个包了:

library(dplyr)

prod_score = read.table(text = "
product score
a       1
d       2
ff      2
e       3
fvf     1
", header = T, stringsAsFactors = F)

prod_rank = read.table(text = "
product   rank
a         11
d         4
ff        1
e         5
fvf       9
", header = T, stringsAsFactors = F)


prod_score %>% 
  filter(score == 2) %>%                                 # select products with score = 2
  inner_join(prod_rank, by = "product") %>%              # join to get ranks
  filter(rank == max(rank)) %>%                          # keep product(s) with maximum ranks
  rename(given_score = score) %>%                        # change column name (for the next join)
  right_join(prod_score, by = "product") %>%             # join to get scores
  mutate(score = ifelse(!is.na(rank), 100, score)) %>%   # update score when there's a rank value
  select(-given_score, -rank)                            # remove unnecessary columns

#   product score
# 1       a     1
# 2       d   100
# 3      ff     2
# 4       e     3
# 5     fvf     1

基础R中的替代方法。请记住重新构建初始示例数据集:

# get products with score = 2
prod_score$product[prod_score$score == 2] -> prds_score_2

# get ranks for those products
prod_rank[prod_rank$product %in% prds_score_2,] -> prds_score_2_ranks

# keep products with maximum rank to update
prds_score_2_ranks$product[prds_score_2_ranks$rank == max(prds_score_2_ranks$rank)] -> prds_to_update

# update values for those products in your initial table
prod_score$score[prod_score$product %in% prds_to_update] = 100

# see the updates
prod_score

#   product score
# 1       a     1
# 2       d   100
# 3      ff     2
# 4       e     3
# 5     fvf     1