Question

首先，我对R很陌生，也对SO提出了新的要求，所以请耐心等待我问的是愚蠢的问题还是不遵循SO约定。

我正在尝试根据以前的呼叫行为为多个用户找到最佳的订阅类型。到现在为止，我设法做到了。匹配98'000行以计算（可变）订阅类型数量的有效成本。

effective costs

还有一个数据框，其中包含每个月每种订阅类型的预计费用： predicted costs

现在，我正在尝试找到最佳的订阅类型，其中每个用户和每个月的费用都低于当前的订阅类型。我将进行合并以显示期望的结果：

comparison

因此，在2019-01月份，subscription_2的费用比User1的当前订阅费用低，因此应该推荐subscription2。在2019-02和2019-03的几个月中，没有任何建议，因为没有价格较低的订阅类型。

对于User2订阅类型，在所有月份中都建议使用subscription_3，因为这些费用始终低于当前订阅。

我目前正在关注DataCamp.com上的课程，并且我可以肯定这几乎肯定是r中非常基本的动作，但是我需要有人来指导我朝着正确的方向发展。

这是我到目前为止所拥有的：

library(dplyr)

effective.costs <- data.frame(
  user = c(rep("User1", 3), rep("User2", 3)),
  month = c(rep(c("2019-01", "2019-02", "2019-03"), 2)),
  current_subscription = c(rep("subscription_1", 3), rep("subscription_2", 3)),
  costs = c(70, 20, 50, 150, 130, 170)
)

predicted.costs <- data.frame(
  user = c(rep("User1", 9), rep("User2", 9)),
  month = c(rep("2019-01",3), rep("2019-02", 3), rep("2019-03", 3)),
  subscription = c(rep(c("subscription_1", "subscription_2", "subscription_3"), 6)),
  calculated_costs = c(
    c(70, 50, 110, 20, 50, 70, 50, 80, 120), 
    c(190, 150, 110, 210, 130, 110, 250, 170, 110)
    )
)

comparison <- merge(effective.costs, predicted.costs, by = c("user", "month"))

getRecommendation <- function(x) {
  subscription <- predicted.costs %>% 
    filter(
      calculated_costs < x['costs'] & 
      user == x['user'] & 
      month == x['month']
    ) %>%
    arrange(calculated_costs) %>%
    select(subscription) 
  subscription <- ifelse(
    length(subscription) > 0, 
    as.character(subscription[1, 1]), 
    NA
  )
  # I know return is not needed, but I'm used to it... :-)
  return(subscription)
}

effective.costs$recommendation <- apply(effective.costs, 1, getRecommendation)

View(effective.costs)

这里最重要的部分可能是函数getRecommendation：

getRecommendation <- function(x) {
  subscription <- predicted.costs %>% 
    filter(
      calculated_costs < x['costs'] & 
      user == x['user'] & 
      month == x['month']
    ) %>%
    arrange(calculated_costs) %>%
    select(subscription) 
  subscription <- ifelse(
    length(subscription) > 0, 
    as.character(subscription[1, 1]), 
    NA
  )
  # I know return is not needed, but I'm used to it... :-)
  return(subscription)
}

我正在尝试apply到effective.costs中的每一行：

effective.costs$recommendation <- apply(effective.costs, 1, getRecommendation)

尽管这为我提供了User2的正确输出，但我目前认为这是巧合的，因为没有建议使用User1，即使每月2019-01应该有一个建议：< / p>

wrong result

有人可以把我推向正确的方向吗？

谢谢！

Answer 1

这消除了apply和getRecommendation函数。 R是向量化的，因此我们应该尽可能多地考虑列方式。

comparison <- merge(effective.costs, predicted.costs, by = c("user", "month"))

comparison%>%
  mutate(net_savings = calculated_costs-costs)%>%
  group_by(user, month)%>%
  filter(net_savings == min(net_savings))%>%
  slice(1) #for ties

apply()函数的问题在于apply()将data.frame强制转换为矩阵。矩阵只能具有一种类类型-在这种情况下，您正在将数字与calculated_costs < x['costs']中的字符串进行比较。

具体来说，计算结果为calculated_costs < ' 50'，其中2位数字有多余的空格。出于任何原因，50 < ' 70'会评估FALSE，而110 < '190'会评估TRUE。

解决方案是在这种情况下以不同的方式解决问题。无需通过apply进行逐行操作。

如何在符合特定条件的数据框中找到最小值并返回特定列

1 个答案: